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Preface 


In the first two editions of the book “Probability,” which appeared in 1980 and 
1989 (see [118]), and were translated into English in 1984 and 1990 (see [119]), all 
chapters were supplemented with a fairly comprehensive and diverse set of relevant 
exercises. The next two (considerably revised and expanded) editions appeared in 
2004 and 2007 (see [121]) in two volumes entitled “Probability 1” and “Probability 
2.” While the work on the third edition was still in progress, it was determined 
that it would be more appropriate to publish a separate book that includes all “old” 
exercises, i.e., exercises included in the previous two editions, and many “new” 
exercises, i.e., exercises, which, for one reason or another, were not included in 
any of the previous editions (the main reason for that was the constrain on the size 
of the volume that could go to print). This is how the present volume “Problems 
in Probability” came to life. On the most part, this book includes problems and 
exercises that I have created, collected and compiled over the course of many years, 
while working on topics and subjects that interested me the most. These problems 
derive from a rather diverse set of sources: textbooks, lecture notes, exercise 
manuals, monographs, research papers, private communications and such. Some 
of the problems came out of discussions that took place during special seminars for 
graduate and undergraduate students in which I was involved. 

It is impossible to cite here with complete accuracy all of the original sources 
from which the problems and the exercises are derived. The bibliography included 
at the end of the book and the citations throughout the main text are simply the result 
of my best effort to give credit where credit is due. 


I would like to draw the reader’s attention to the appendix included at the end of 
the book. I strongly recommend that anyone using this book becomes familiar— 
at least in passing—with the material included in the Appendix. There are two 
reasons for this recommendation. First, the appendix contains a summary of the 
main results, notation and terminology from probability theory, that are used not 
only throughout this book, but also throughout the books “Probability.” Second, 
the appendix contains additional material from combinatorics, potential theory and 
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Markov chains, which is not covered in these books, but is nevertheless needed for 
many of the exercises included here. 

The following referencing conventions are adopted throughout the book: 

(a) All references to the books “Probability” (see [121]) start with the token P . 
For example, “[ P §1.1, 3]” points to Part 3 of Sect. 1 in Chap. 1 in [121], “[ P §2.6, 
(6)]” and “[P §2.6, Theorem 6]” point, respectively, to (6) and Theorem 6 from 
Sect. 6 in Chap. 2 in [121]—and so on. 

(b) Problems included in this book are referenced, for example, as “Prob- 
lem 1.4.9,” which points to Problem 1.4.9 from Sect. 1.4 in Chap. 1 below. 


The reader must be forewarned that the problems and the exercises collected in 
the present volume differ from each other in nature: 


(a) Some problems are simply meant to test the reader’s understanding of the 
basic concepts and facts from the books “Probability.” For example, the exercises 
from Sects. 1.1 and 1.2 in Chap. 1 relate to the various combinatorial methods for 
counting the favorable outcomes of an event and illustrate the important notions of 
partial factorial (N )„, combinations Cx, and Cy, Enis Catalan numbers C,,, Stirling 
numbers of the first and second kind sẹ and S\,, Bell numbers By, Fibonachi 
numbers F,, etc. 

(b) Other problems are of a medium-to-high degree of difficulty and require 
more creative thinking. A good example is Problem 7.4.3, which is asking for a 
unified proof of Lebesgue’s dominated convergence theorem and Levy’s theorem of 
convergence of conditional expectations. 

(c) Some of the problems are meant to develop additional theoretical concepts 
and tools that supplement the material covered in the books “Probability,” or simply 
to familiarize the reader with various facts that, typically, are not covered in the 
mainstream texts in probability theory, but are nevertheless “good to know”—or at 
least good to know that such results exist and be aware of the respective sources. 
One such example is M. Suslin’s result (see Problem 2.2.27 below), which states 
that the projection of a Borel set in the plane onto one of the coordinate axes may 
not be a Borel set inside the real line, or the result describing the set-operations 
that allow one to produce the smallest algebra or o-algebra that contains a given 
collection of sets—see Problems 2.2.25, 2.2.26 and 2.2.32. One must realize that, in 
fact, many problems of this type represent fairly difficult theorems. The formulation 
of such theorems in the form of exercises has the goal of inviting the reader to think 
and to ask questions like: how does one construct a o-algebra anyway? The answer 
to this and similar questions is of paramount importance in the study of models and 
phenomena that pertain to what one may call “non-elementary probability theory.” 

(d) Some of the problems are related to the passage from random walks to Brow- 
nian motions and Brownian bridges—see Sect. 3.4, for example. The statements in 
these problems are intimately related to what is known as the “invariance principle” 
and may be viewed as some sort of a prelude by way of problems and exercises to 
the general theory of stochastic processes in continuous time and, in particular, to 
the functional limit theorems. 
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Many (but not all, by far) of the problems included in this book contain hints and 
other relevant comments. I very much hope that these hints and comments will be 
helpful not only for deriving the solution, but also for learning how to think about 
the related concepts and problems. 

Over nearly 50 years several of my colleagues at MSU have published exercise 
manuals in probability theory that have been in continuous use in courses offered at 
MSU, as well as in other institutions of higher education. I would like to mention 
them: 


1963 — L. D. Meshalkin. Exercise Manual in Probability Theory. Moscow 
University Press, Moscow; 

1980 — B. A. Sevastyanov, V. P. Chistyakov, A. M. Zubkov. Exercise Manual in 
Probability Theory. Nauka, Moscow; 

1986 — A. V. Prohorov, V. G. Ushakov, N. G. Ushakov. Exercises in Probability 
Theory: Basic Notions, Limit Theorems, Random Processes. Nauka, Moscow; 
1989 — A. M. Zubkov, B. A. Sevastyanov, V. P. Chistyakov. Exercise Manual in 
Probability Theory, a-ed.. Nauka, Moscow; 

1990 — M. V. Kozlov. Elements of Probability Theory Through Examples and 
Exercises. Moscow University Press, Moscow. 


Since this last book was published nearly 15 years ago, the curriculum in most 
graduate-level courses in probability theory has changed considerably. Some new 
directions have emerged, new areas of research were developed, and new problems 
were formulated. An earnest effort was made to adequately reflect these changes in 
the books “Probability” and, naturally, in the present volume, which first appeared 
in 2006. At the same time, the traditional coverage of all classical domains of 
probability theory was kept intact. At the end more than 1,500 problems (counting 
the various parts of problems) found their way into the present volume. 

As was the case with the books “Probability,” the final edit, arrangement and 
proof-reading of the text was done by Tatyana Borisovna Tolozova, to whom I am 
deeply indebted. 
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Finally, I would like to express my gratitude to Andrew Lyasoff not only for 
translating the present volume into English, but also for making a number of 
corrections in the original and for enriching the text with many comments and 
clarifications. 


Moscow Albert N. Shiryaev 
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Chapter 1 
Elementary Probability Theory 


1.1 Probabilistic Models for Experiments with Finitely 
Many Outcomes 


Problem 1.1.1. Verify the following relations involving the operations N (intersec- 
tion) and U (union): 

AU B= BUA, AN B= BNA (commutativity), 
AU(BUC)=(AUB)UC,AN(BNC) = (AN B)NC (associativity), 
AN(BUC) = (ANB)U(ANC), AU(BNC) = (AUB)N(AUC) (distributivity), 
AU A = A, AN A = A (idempotent property of N and U). 


Then prove that 
AUB=ANB and ANB=AUB, 
where = stands for the operation “complement of a set.” 


Problem 1.1.2. (Various interpretations of the partial factorial (N),=N(N — 1) 
...(N —n + 1), ie., the number of permutations N take n—see Sect. A.1.) Prove 
that: 

(a) The number of all ordered samples (...) without replacement (equivalently, 
samples without repetition) of size n drawn from any finite set A of size |A| = N, 
1 <n < N, equals (N),. 

(b) The number of all words of length n composed from different letters selected 
from an alphabet that consists of N letters, 1 < n < N, equals (N),. 

(c) Given a finite set X of size |X| = n and a finite set Y of size |Y| = N, 
n < N, the number of all functions f: X > Y such that if x1, X2 € X and xı Æ x2 
then f (x1) Æ f (x2) G.e., the number of all injections from X to Y) equals (N )n. 


Problem 1.1.3. (Various interpretations of the binomial coefficients Cù = 


Tsee Sect. A. 1.) Prove that: 
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(a) The number of all unordered samples [. . .| without replacement (equivalently, 
samples without repetition) of size n, drawn from any finite set A of size |A| = N, 
1 <n < N, equals Cj, . 

(b) The number of all ordered finite 0-1-sequences (...) of length N that contain 
exactly n 1’s and exactly (N — n) 0’s, 1 < n < N, equals C . 

(c) The number of all possible placements of n indistinguishable particles into N 
distinguishable cells, 1 < n < N, in such a way that each cell can contain at most 
one particle (the so called “placement with locks”), equals Cj, . 

(d) The number of all possible nondecreasing paths on the two-dimensional 
lattice Z? = {(i,j):i,j = 0,1,2,...}, that start from the point (0, 0) and end at 
the point (n, N — n), 0 < n < N, equals Cy, (a path on the two-dimensional lattice 
is said to be nondecreasing if at each step the path moves either up by +1 or to the 
right by +1—notice that C}, = 1). 

(e) The number of all different subsets D of size |D| = n that are contained in 
some finite set A of size |A| = N,n < N, equals C}, . 

Hint. Assuming that (a) has already been established, then one can establish 
(b), (c), (d) and (e) by proving the equivalence relations (a) <=> (b), (a) <=> 
(c),...—exactly as this is done in [P §1.1, Example 6]. 


Problem 1.1.4. Similarly to Part (d) in the previous problem, consider the class 
of all nondecreasing paths on the lattice Zz = {(i, j): i,j = 0,1,2,...}, that 
start from the point (0, 0) and end at the point (n, n), while never moving above the 
diagonal, i.e. all paths that go from (0,0) to (n,n) and remain in the set {(i, j) € 
Te 0< j <i < n}. Prove that the number of paths in this class is given by the 
(n + 1)* Catalan number C,,+1, the n™ Catalan number, n > 1, being defined as 


l on 
Cr = a Cxn-1) - 


Note: Sometimes the Catalan numbers are defined as c, = Pa C}, (= C41), 
n > 0 (see, for example, [6]). 
Prove that C),..., Co equal, respectively, 1, 1, 2, 5, 14, 42, 132, 429, 1430. 


Problem 1.1.5. The Catalan numbers C,,, n > 1, show up in many combinatorial 
problems. Consider, for example, the number of binary bracketings of n letters— 
this is the number of all possible ways in which one can compute the sum of n 
numbers that are arranged in a row by adding only 2 neighboring numbers at the 
time. For instance, one can compute the sum a + b + c either as ((a + b) + c) or 
as (a + (b + c)). Thus, the number of binary bracketings of three letters equals 2. It 
is not hard to see that there is a total of 5 binary bracketings of 4 letters: 


a+b4+c+d=((a+b)+(c+d)) 
= (((a+b)+c)+d) 
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=((a+(b+c))+d) 
=(a+((b+c)+4)) 
= (a + (b + (c + d))). 


(a) Prove that for any integer n > 3, the number of all binary bracketings of n 
letters is given by the Catalan number C,,. 

(b) Consider Euler’s polygon divison problem: In how many different ways can a 
plane convex polygon of n sides, n > 4, be divided into triangles by non-intersecting 
diagonals? Prove that the answer to Euler’s polygon divison problem is given by the 
Catalan number C„,—1. 

Hint. If a convex n-gone is divided into (non-intersecting) triangles whose 
vertices are also vertices of the n-gone, then there would be exactly (n — 2) 
triangles and any such division corresponds to the choice of (n — 3) non-intersecting 
diagonals. 

(c) Consider the numbers C * 


, > 1 = 0, defined recursively by the relations: 
n—1 


Cy =0, Cř=1 and Cř= XS cg Shane for m > 1. (x) 


i=l 


Prove that, for any n > 1, the number C * coincides with the n™ Catalan number 
Cn; in other words, prove that the Catalan numbers can be defined equivalently by 
way of the recursive relation (*). 

(d) Prove that the generating function F*(x) = }_„>; C7 x”, associated with 
the sequence (C,*)n>1 and defined by the recursive relation (*) above, satisfies the 
following relation: 

F*(x) = x + (F*(x)}. 


(e) By taking into account that F*(0) = 0, prove that 
* 1l 1/2 1 
F" (x) = 50- = 4x) Ts x< 7 


and conclude that, just as one would expect, the coefficients C,* in the expansion of 
the function F*(x) coincide with the Catalan numbers C,,: 


* 1 n n n— 
Ca = -301/2 (—4) = 1Co = Cn i 


(For the definition of the quantity Cj, see Problem 1.2.22.) 


Problem 1.1.6. (Various interpretations of the binomial coefficients Cy 4,_, -) 
Prove that: 

(a) The number of all unordered samples with replacement [. ..] of size n drawn 
from any finite set A of size |A| = N equals Cy, ,,_, - 
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(b) The number of all ordered lists (n,,--- ,ny), N > 1, whose entries n;,i = 
1,..., N, are non-negative integer numbers that satisfy the relation nı +---+ny = 
n, for some fixed n > 1, equals Chtn-1 : 

(c) The number of all possible placements of n > 1 indistinguishable particles 
into N > 1 distinguishable cells without a restriction on the number of particles in 
each cell (the so-called “placement without locks”), equals Cy, - 

Hint. Follow the hint to Problem 1.1.3. 


Problem 1.1.7. (Continuation of Part (b) in the previous problem.) Given some 
fixed integers, N > 1 andn > 1, consider the collection of all unordered 
solutions [71,...,”y] to the equation nı +---+ny = n, in terms of some non- 
negative integers n; > 0,i = 1,..., N. What is the total number of solutions in 
this collection? What is the total number of all—still unordered—strictly positive 
solutions n; > 0,i = 1,..., N? What is the total number of all ordered solutions 
(nı,..., ny) to the same equation, nı +--+ ny = n, in terms of positive integers 
nj >0,i =1,...,N? 


Problem 1.1.8. (Continuation of Part (b) in Problem 1.1.6 and Problem 1.1.7.) 
Given some fixed integers, n > 1 and N > 1, consider the inequality n,+--:+nwn < 


n. Count the total number of all ordered solutions (71,..., ny ) and the total number 
of all unordered solutions [7;,...,y] to this inequality, in terms of non-negative 
or strictly positive integers n;, i = 1,..., N. 


Problem 1.1.9. Prove that: 
(a) The maximal number of disjoint regions in the plane R*, determined by n 
different lines that are placed arbitrarily in the plane R?, equals 


n(n + 1) 


1 
a 2 


(b) The maximal number of disjoint regions in the space R*, determined by n 
different planes that are placed arbitrarily in the space R?, equals 


1 
gin +5n +6). 


Problem 1.1.10. Suppose that A and B are any two subsets of the set 92. Prove that 
the algebra, w(A, B), generated by these two sets—i.e., following the terminology 
introduced in [P §1.1, 3], the algebra generated by the system æ% = {A, B}— 
consists of the following N(2) = 16 subsets of 2: 


{A, B, A,B, AN B, ANB, A\ B, B\ A, 


AUB, AUB, AUB, AUB, AAB, AAB, 2, Ø, 
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where A A B = (A \ B) U (B \ A) is the so called “symmetric difference” of the 
sets A and B (see [P §2.1, Table 1)). 

Find those partitions (see [P §1.1, 3]) of the set 2, for which the algebra 
a(Y), i.e., the algebra generated by Y, coincides with a(A, B). 

Finally, prove that the algebra æ(A1,..., An), generated by the system æ% = 
{A,,..., An}, where A; C 2,i = 1,...,n, consists of N(n) = 2” different 
subsets of 2, so that N(2) = 16, N(3) = 256, and so on. 


Problem 1.1.11. Prove Boole’s inequalities 


(a) p(Ua Je SPA). P(Na) = 1- Pa, 


i=1 i=1 i=] i=l 


Prove that for any integer n > 1 the following inequality is in force 


i=l 


i=l 


Prove the Kounias inequality 


(c) P(Ua) min} SPa) -Y P(A; NAD}. 


i=l i=l i#k 


Prove the Chung-Erdos inequality 


» e(Ü r jee (Ein PD) 


Era PANA) 


Hint. With n = 3 Part (b) comes down to the inequality P(A; N A2 N 
A3) > P(41) + P(A2) + P(A3) — 2, which can be established with elementary 
considerations. The general case can then be handled by using induction with 
respect to n. To prove (c), it is enough to establish that the following inequality 


holds: ? a 
P(U A) < oP) — Yo PAIN Ai). 
i=l 


i=1 2<i<n 
This inequality, too, can be established by induction. 


Problem 1.1.12. Prove the “inclusion—exclusion formulas” (also known as 
Poincaré’s formulas, Poincaré’s theorems, Poincaré’s identities) for the probability 
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of a union of events and the probability of an intersection of events; namely, prove 
that, for any n > 1 and for any choice of the events A;,..., An, one has: 


(a) P(A, U... U An) = X CDH > P(A N... N Ain) 


m=1 leijen ciny En 
= X P(4,)- $O P(A NA+ JO P(A, NA, N An) 
1<ij<n 1Si, <ig<n 1<i <i2<i3<n 


+... 1) PA N... N An) 


and 


(b) PMN.. N A) = J Ert > P(A U... U A;n) 


m=1 ISi <...<iy) <n 
= >> Pp- DD P(A UAn) + O P(A, UAn U An) 
1<i;<n l<i <i <n 1<i| <i2<i3<n 


+...4+(-1)"*!P(A, U... U An). 


Note I. The formula in Part (a) is often written in the form 


P(Uai) = Sı — S2 +... + (177 'S,, 


i=l 


where 
Sn= >) O P(A N0.. N An) 


1<i <...<im<n 


while the formula in Part (b) is often written in the form 


P(A ai) -e S 


i=l 


where 
Šn= JO P(A U... U Am). 
leijen in 

Note 2. Although the inclusion—exclusion formulas are considered here in the 
context of [ P Chap. 1], which deals only with finite probability spaces (2, æ, P), 
it is important to recognize that, in fact, these formulas are valid on any (finite or 
infinite, countable or uncountable) probability space (Q,.#,P), regardless of its 
nature (see [ P Chap. 2]). 
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Nevertheless, in order to use the inclusion—exclusion formulas in concrete 
situations, one must be able to compute somehow the quantities Sm, or, which 
amounts to the same, the probabilities P(A; N. ..Aj,,). Usually, such computations 
take into account the concrete probabilistic structure of the model encoded in 
the space (2, .e&7,P)—the models associated with the Bose-Einstein statistics and 
Fermi—Dirac statistics illustrate this point rather well. 

Hint. The formula in Part (a) can be established by induction with respect to the 
number of events n, after showing first that for n = 2 one has 


P(A, U A2) = (P(A1) + P(A2)) — P(A1 N A2). 


(See also Problem 1.4.9). 
To prove the formula in Part (b), notice first that 


CORE 


i=l 


Ua). 


i=1 
and then apply the formula from Part (a) to the events Ai, ..., An, instead of the 
events A,,..., Án- 


Note 3. Anticipating the use of the inclusion—exclusion formulas later in 
this book, notice that P Ai) is the probability that neither of the events 


Aj,..., Án occurs. 


Problem 1.1.13. Let B,, denote the event that exactly m of the events A,,..., An 
occur, the integers n > m > 0 being fixed. Assuming that the quantities S),..., Sn 
are defined as in Problem 1.1.12, prove that 


P(Bn) = > (CI "CH Sk, 
k=m 
which can be written also as 
P(Bn) = Sn - Cay Sm+1 Peer E (1I) "Cr" Sa, 
and conclude that the probability P(B>m) of the event B>m that at least m of the 
events Aj,..., A, occur, is given by 
P(Bxm) = P(Bn) +... + PG) = > CI" Czy! Sk, 
k=m 


which can be written also as 


Pi) = Se =C Spe esc C1 "CO S 
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Hint. The formula for P(Bm) is known as Waring’s formula and can be proved 
by using the method of “inclusion—exclusion,” just as in Problem 1.1.12. Such a 
proof can be found, for example, in W. Feller’s book [39, vol. 1, Chap. IV, § 3]. 
However, readers familiar with the notions of random variables and expected values 
(see [ P §1.4] and [ P §2.6, 4]) may follow these steps: 

Given any i = 1,...,n, let X; = I4, denote the indicator of the event A; and 
consider the sum 


isa! os Xin, ( =X) A = X jnm)» (x) 


where the summation is taken over all C’” possible choices of the (unordered) list 
[i1,---,%m] from the list [1,...,] and 


lji, -s jazm] = [1,...,”] \ [i,-..., in]. 


When evaluated at a particular outcome w, the sum in (*) is equal to 1 precisely 
when w belongs to exactly m of the events A;,...,A, and is equal to O in all 
other cases. Consequently, the quantity P(B,,,) is nothing but the expected value 
of the sum (*). The remaining steps are similar to those described in the hint to 
Problem 1.4.9. (See also Problem 2.6.31.) 


Problem 1.1.14. By using the formulas for P(Bm) and P(B>m) obtained in 
Problem 1.1.13, derive Bonferroni formulas: for any even integer number r > 2 
one has 


r+l 
Sin + Xe ck Chk Sm+k < = P(Bn) < S Sm or yE Ce Chak Sintks 
k=1 k=1 
r+l r 
Sin + Xe DCE ag 1 Sm+k < P(Bsm) = Sin F GS Chei Sm+k ; 
k=1 k=1 
where the quantities S,,...,.S,, are defined as in Problem 1.1.12. 


Hint. One possibility is to prove first the following (also very useful) identities: 


Sm = > Cra), Se 3 Cry! P(Bsm) - 


r=m r=m 


Problem 1.1.15. By using the definition of the quantities S,,...,S, given in 
Problem 1.1.12, derive: 

(a) Bonferroni inequalities (this is a special case of the formulas obtains in the 
previous problem): for any integer k > 1 with the property 2k < n, one has 


1.1 Probabilistic Models for Experiments with Finitely Many Outcomes 9 


n 


S184... Su SP UA) E513. Sc 


i=l 


(b) Fréchet inequality: for any integer 0 < r < n — 1 one has 


(c) Gumbel inequality: for any integer 1 < r < n — 1 one has 


CH Sua OF =S, 
- < 
(ou == cro! 


n—1 n-i 


Problem 1.1.16. (“The matching problem.”) Given some fixed integer n > 1, 
consider the set of all possible permutations of the list (1,...,7), suppose that one 
permuatation is chosen at random from that set and denote this randomly chosen 
permutation by (i),...,7,). Assuming that all permutations are equally likely to 
occur, i.e., each permutation is chosen with probability 1/7!, prove that: 

(a) The probability Pon) that exactly m of the numbers 1,...,n, 1 <m <n, 
appear in the permutation (i1,..., in) in their own positions (i.e., in the same 
positions in which they appear in the list (1,...,7)) is given by 


1 i 1 a 1 1 +... 4 CD” 1 e! 
— — ... + — x —, n> œ}. 
m! I! 2! 3! (n—m)! m 


(b) The probability P(s1) that at least one of the numbers 1, ..., n appears in the 
permutation (i1,...,i,) in its own position is given by 


1 1 1 
1-~+—-... -1)"!'!— («1-e!, n>o), 
2! 3! TN n! ( ) 
and, consequently, the probability for a complete “disorder” (i.e., a situation where 
none of the numbers 1,...,7 appears in its own position in the list (,,...,7,)) is 
given by 1 — Pen = Dio — ~ e~! whenn —> oo). 
Hint. For any 1 <i < n let A; denote the event that the number 7 is located in 
the i position of the list (i1, ... , in). The probability Pon) is then the same as the 
probability P(B,,) in the previous problem, so that 


Pin) = Sin = Chu Sm+1 Poss F O O Sn : 


Showing that in the present setting one has S = 1/k! for any m < k < n would 
complete the proof. 
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In order to establish the formula for Ps 1), it is enough to notice that, using again 
the results from the previous problem, one has 


Pot = P(B>1) = Sı — $2 + $3 —... + (=1)"7! S, r 


(n—k)! 


where, in this case, Sy = = 


Problem 1.1.17. (“The absent-minded secretary problem.”) There are n different 
letters and n envelopes addressed to the respective recipients of the letters. The 
secretary who prepares the letters is absent-minded and stuffs the letters into the 
envelopes at random. Assume the “classical,” i.e., equal-likelihood-of-outcomes, 
definition of the probabilities involved (see [P §1.1, 5]), and let Pon) denote the 
probability that exactly m of the letters will reach their (correct) recipients. 


Prove that 
1 n—-m Ge 1 yi 
Pon) = a(1- > ) 


jo 


Hint. 1. First, one must clarify the assumption that “the secretary stuffs the letters 
into the envelopes at random.” If we are to assume that the secretary chooses at 
random one of the n envelopes and stuffs the first letter into that envelope, then 
chooses at random one envelope from the remaining n — 1 envelopes and stuffs 
the second letter into that envelope, and so on, then the entire procedure would 
be tantamount to taking an ordered sample of size n without repetion from the 
set of symbols (a1, ..., an) that represent the different envelopes and then making 
the assumption that any such sample is equally likely to occur, according to the 
principles described in [P §1.1, 5]. This means that we have an experiment with 
(n), = n! possible outcomes, every one of which occurs with probability 1/n!. 

2. Denote by A; the event that the i-th letter is placed in its own envelope. Then 
Pon) = P(Bm) (see Problem 1.1.13) and, consequently, 


Pin) = yo encep Sk. 


k=m 


After noticing that in this setting S = 1/k!, 1 < k < n, one obtains the desired 
formula for Pony. Notice that the probability Pio) that none of the letters reaches 
its recipient equals }*;_,(—1)**!4, which is close to 1 — e~! even for relatively 
small values for n—for example, with n = 5 this sum equals 0.633333, while 


1—e! ~ 0.632121. 


Problem 1.1.18. There are n children in a given kindergarten. When leaving the 
kindergarten each child chooses at random one left and one right shoe. Prove that: 

(a) The probability P, that none of the children will bring home his or her own 
pair of shoes is given by 
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ie , —k)! 
P= DV) kin! ` 
= 


(b) The probability P, that none of the children will bring home at least one of 
his or her own shoes is given by 


n 1 2 
P= (Zi) 
k=2 


Hint. First, one must give meaning to the phrase “each of the n children chooses 
at random one left and one right shoe”—this can be done by following the principle 
outlined in the hint to Problem 1.1.17. 

(a) Let A; denote the event that the i child takes both his or her left and right 
shoes. According to the inclusion—exclusion formula, we have 


n 


P= P( (12) =1-P(Ua) =1- $1 + S2- ... + (1) Sn, 


i=l i=l 
and one must show that in this case S$; = aE which gives the desired formula 
for P. _ 

(b) In order to established the formula for Pp, it is enough to notice that P, is 
simply the product of the probability that none of the children brings home his or 
her left shoe and the probability none of the children brings home his or her right 
shoe, after which the statement in (b) follows with a straight-forward application of 
the result established in Problem 1.1.17. 


Problem 1.1.19. There are n particles that are distributed in M boxes according 
to the Maxwell—Boltzmann statistics (placement without locks of distinguishable 
particles in distinguishable cells). By following the classical method of Laplace for 
counting probabilities (see [ P §1.1, (10)]), which encodes, so to speak, “the random 
nature” of the placement of the particles, prove that the probability, P,(n; M), that 
exactly k particles appear in any fixed cell is given by 


, (M-1)"* 
P(n; M) = Ck ——___.. 
k(n ) n M” 
Conclude from the above formula that when n —> oo and M — œ in such a way 
that n/M — i > 0, then 
es 
P(n; M) > eo TE 


(Comp. with the Poisson distribution—see [ P §1.6] and [ P §2.3, Table 2].) 
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Problem 1.1.20. (Continuation of Problem 1.1.19.) Let Rm(n; M) stand for the 
probability that exactly m cells remain empty. Prove that 


M-—m 


m+k\" 
Rm(n; M) =C2 =o. t=}, 
(n;My =Ct, Y CI) hrm ( a) 
k=0 
and conclude that if n — oo and M — œ in such a way that Me™M _.)> 0, 


then 


m 


À 
Rn(n; M) > yr, 
m! 


Problem 1.1.21. Consider again a random placement of n particles into M cells, 
but according to the Bose-Einstein statistics (placement without locks of indistin- 
guishable particles in distinguishable cells). Denote by Q,(n; M) the probability 
that there are exactly k particles in any fixed cell. Prove that 


ck ; 

f M-+n—k—2 

Ox(n; M) = e ae 
M-+n-1 


and conclude that when n — co and M —> oo in sucha way that n/M — A > 0, 
then 


:M l= 7)", wh = —_. 
Ox(n; M) > p(1— p)", where p EF 


(Compare with the geometric distribution—see [ P §2.3, Table 2].) 
Problem 1.1.22. A box contains N balls labeled 1,..., N. A ball is sampled n 
times from the box randomly and with repetition (i.e., the ball is returned to the box 


after each sample). Given any fixed k € {1,..., N y, let Ax denote the event that the 
largest label found among the sampled balls equals k. Prove that 


k” —(k —1)" 


P(Ax) = Na 


In addition, prove that if the balls are sampled randomly but without repetition, then 


for any n < k < N one has 


Problem 1.1.23. Verify the Leibniz formula for the N" derivative of the product of 
two functions f and g: 


N 
D^ (fg) =) Ch (D" fD” 8). 


n=0 
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Hint. Consider using induction and the property Ch}; = Cy +C i , Le., the 
so called “Pascal triangle property.” 


1.2 Some Classical Models and Distributions 


Problem 1.2.1. Prove that: 


(x +y)" = SOCK x y""* (binomial identity), 
k=0 

(x+Y)n = ` C (x)k Y)n-x (Vandermonde’s identity), 
k=0 


[x + ylh = `X G [x] [y]n-x (Nørlund’s identity), 
k=0 


where 
(X) = x(x —1)(x —2)...(x -—n + 1), 
[x] = x(x + 1)(x +2)...(X +n-1). 


Hint. Consider using Taylor’s expansion for polynomials. 


Problem 1.2.2. By using probabilistic, combinatorial, or geometric arguments 
(say, by counting the number of favorable outcomes, or, counting the number of 
paths that connect one point with another), or some other type of reasoning (say, 
by way of some algebraic argument analogous to identifying the coefficients for x” 
in identities of the form (1 + x)*(1 + x)’ = (1 + x)*t°)), verify the following 
claims about the binomial coefficients (below |x| denotes the integer part of the 
real number x, i.e., the largest integer number which is not greater than x, while 
[x] denotes the smallest integer number which is not smaller than x): 


e feC <C! <... < C2 = c2 sf O'S =1 


(symmetry and unimodality); 
e CY +C = Cki (this is the Pascal triangle rule) 


Ad C= = Cy T 


N 
° Dey = 2%. 
k=0 
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k-2. 
7) eh 2 Oy 3 , 


N 
DCH? = CH: 


k=0 


N N 
e X 2C =3", Soc, an, 


N 

© SC)" *cK 
k=0 
N 

© Š kk-1)C} 
k=0 


° kC% = NCE! 1 


. CEC =o cs 


k 


=Cy_). M>N+1; 


= N(N =12".. NS: 


N 
k _ akt+l. 
, > Ch = Cy 41 ? 
m=k 


1<k<N; 


, 


k 
© Xc =} Ci» k<N-1: 
j=0 


j=0 
k 
° X Ch = =C) 


j=0 


+k+1° 


Che +C- = CN O<k<N; 


Chm - Sch cr, —k ( 


Cac, Lene 


Cp CRH < (CR), ns 


CE y <(1+ 2 (+7 


or, equivalently, 


Vandermonde’s binomial convolution) ; 


N+1 


’ 


T 
(M+N)! 2 M!N! | 
(M + N)M+N ~ MMNN’ 
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NE 
Kl 


NE 


m30 < 


N 
© Sock CDF = 1; 
k=0 


N 
e C=) ey", 1<n<N; 


k=n 
N N 
© agso, Vicktor! =cB, Len =m: 
k=0 k=0 
e CHF =C¥ -CMH 4.20), Man: 
M 
© 5 CDC = CDC; 
k=0 
= (=1)"C” , ifN =2m 
. Setet =| a ; 
(=p 0 if N A 2m 
N š 
, (-1)"(3m)!(m!)-3, if N = 2m 
© eves) -| ee | . 
b= 0 if N A 2m 
N k=l N 
=i 1 
Poca 
k k 
k=1 k=1 
= 0, iff<N 
© ED Ki CE = 
Ss N!, ifl=N 
N 
© > Oo Ch Ho. 
k=0 


(See also Problem 1.2.22.) 


Problem 1.2.3. Prove that if p is a prime number and 1 < k < p — 2, then p 
divides ck and one has C, = 2 (mod p). 


Problem 1.2.4. Prove that the number of different ways in which a set of N objects 
can be split into no more than two disjoint sets, the order in the sets being irrelevant, 
equals |N/2] + 1, where |x] is the integer part of the real number x. 


Problem 1.2.5. Given N > n > 1, the Stirling number of the second kind, S% , is 
defined as the number of all possible partitions of a set of N objects, say, the set 
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{1,2,..., N}, into n disjoint and non-empty sets with no regard of the order in the 
sets. ! 
Prove that the following relations are in force: 


G) Seas, Sly. 660" "a1, 8) =e 


(b) S4 = a + nS , l<n<QJQ; 


N 
@ Si = ChS (SP =0p>k); 
k=0 
) 2 
(dd) St= z DOOD Ci mn- k)”; 
` k=0 
n 1 T n—kak}N. 
© S= Yenc k” ; 
k=0 
E syen Cr 
Hint. To prove (b), which is the key to deriving (c) and (d), use the relation 
= La Să (x)n (see page 376 in the Appendix) and the relation (x),4) = 


(x = n) n. 


Problem 1.2.6. By using the relation x” = 5 SK (X)n, it is shown in Sect. A.3 
that the exponential generating function 


xN 
Esx) = J Sh m 


N>0 
associated with the sequence S” = (S¥,)n>0, consisting of Stirling numbers of the 
second kind, has the property 

(ex 2 1)" 


Ess (x) = nl 


Prove the above identity by using property (e) in the previous problem. 


'The definitions and some basic facts concerning the Stirling numbers (of the first and the second 
kind), and also of the Bell numbers, can be found in Sect. A.1. 
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Problem 1.2.7. According to one of the definitions of the Stirling numbers of the 
first kind, sx, (see page 377 in Sect. A.3), the number (— pss gives the total 
number of permutations of the set {1,2,..., N} with exactly n cycles (note that 
= 0). 
Prove that 


(a) sp =s =], 


b) shy =s L NSh, L<n<N, 


(c) Se 1^7 n sh =e |= NI. 


n=l n=1 


In addition, prove that the numbers sẹ, 0 < n < N, satisfy the following algebraic 
relation (see page 377) 


N 
@) ny =) shx 


n=0 
where (x)y = x(x — 1)...(x -N +1). 
Hint. The recursive relation (b) may be established by way of combinatorial 


reasoning. Alternatively, it may be derived directly from the algebraic relation (d). 


Problem 1.2.8. Prove the following duality property of the Stirling numbers of the 
first and second kinds: 


a Sy = = nm ’ 


n>0 
where dap is the Kronecker symbol associated with the quantities a and b, i.e., day = 
lifa =bandé,, = 0 ifa #b. 


Problem 1.2.9. Prove that the exponential generating function 


Es (œ) = 2o 
N>0 


associated with the sequence s” = (s4,)v>0, comprised of Stirling numbers of the 
first kind, is given by the following formula 


(Indl + x)” 


Bsn (x) = n! 


Problem 1.2.10. Given any N > 1, the Bell number By is defined as (see page 362 
in the Appendix) the number of all possible partitions of the set {1,2,..., N}, or, 
which amounts to the same, 
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N 
By = D SNo 
n=1 
where S, 1 <n < N are the Stirling numbers of the second kind. 
Setting Bo = 1, prove that: 
(a) The following recursive relation is in force 


N 
By = X CK} By-x. 
k=1 


(b) The exponential generating function Eg(x) = È} y >o BN ak is given by the 
formula 
Eg(x) = exp{e* — 1}. 


(c) By < N!and limyo0(By/N!))V/% = 0. 


Finally, verify that the numbers B),..., Bs equal, respectively, 1, 2,5, 15, 52. 
Hint. To prove (b), use (a) and check that the function x —> Eg (x) satisfies the 
following first-order equation 


dEB( : 
cen) = e“ Ep(x), 


with boundary condition Æg (0) = 1. In order to prove the second property in (c), 
consider the radius of convergence R = 1/ lim (4x) MAN for the series Yin >o BN 
which, as is easy to see, converges for all real x. 


Problem 1.2.11. (Fibonacci numbers.) Given any integer n > 1, let F, denote the 
number of all possible representations of the number n as the sum of an ordered list 
of 1’s and 2’s. Thus, one has F; = 1, F> = 2 (since 2 = 1 + 1 = 2), F} = 3 (since 
3=14141=142=241),%=5(sinee4=14+14+141=24+141= 
14+2+1=1+4+1+2=2+ 2), and so on. 

(a) Setting Fo = 1, prove that for any n > 2 the Fibonacci numbers F, satisfy 
the following recursive relation 


Fa = Rhett Fnr, N22. (*) 


(b) By using the above relation, prove that 


PRES 


Notice that 15 ~ 1.6180339887 ... and 15X5 ~ —0.6180339887 ... . 
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(c) By using (*), prove that the generating function F(x) = } „>o Fux", 
associated with the sequence (F;,)n>0, is given by the formula 


1 


F = ——_... 
(2) 1—x-—x? 


(4k) 
(d) The Fibonacci numbers? have many interesting properties. For example, 
setting F_, = 0, for any choice of the integers m,n > 0 one can write: 


Fo + Fi +... + Fa = Fap — 1, F2; +F? = Fo, 


a= 


Fa—1ıFa + Fa Fayı = Fon+1, FmFa + Fm- Fn-1 = Fm+n ; 


Verify the last four identities. 


(e) Prove that for any n > 0 the nh 


Fibonacci number is given by 


Ln/2] 
Fy, = > Cy è 
k=0 


For example, convince yourself that the list of the first 18 Fibonacci numbers 
{Fo, Fi,..., Fi7} is given by 


{1, 1,2, 3,5, 8, 13,21, 34,55, 89, 144, 233, 377, 610, 987, 1597, 2584}. 


(f) Prove that for any n < 9 one has 


Fn 


ma — | 


but for n = 10 one has Fio/[e?/?] < 1 and 


ke wc 
oer Pare) 


Hint. (b) To prove (**), start by looking for a sequences of the form (F, = 
a”),> 1 that satisfies the recursive relation F, = Fa—1 + F,—2. The formula (**) may 
be obtained also by considering the coefficients for x” in Taylor’s expansion of the 
function (***). In this context it is useful to notice that 1—x—x? = (1—ax)(1—bx), 


fora = (1+ /5)/2 and b = (1 — V/5)/2. 


?Tradtionally linked to the population growth of a colony of rabbits, and described as early as the 
thirteenth century AD, by Leonardus Pisanus de filiis Bonaccii, widely known under the nickname 
“Fibonacci,” in his book “Liber Abaci,” probably written around 1202 CE. 
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(£) To prove (****), notice that (xx) implies F, œ c; (1.618...)”, while 
fe@—D/2] ~ cz (1.648 ...)", with some appropriate constants cı and c2. Try to find 
these two constants. 


Problem 1.2.12. Prove that the multinomial (polynomial) coefficients 


N! 


———_, mt+...tn,=N, no, 
ny!...n,! 


Cy(m1,...,Nr) = 


satisfy the following formula, known as Vandermonde’s multinomial convolution 
formula: 


Cyn (m1,-...2r) = >) Cy, (ki... kr) Cn, (11 — ky... tr — kr), 


the summation being taken over all possible choices of the integers {k;;i = 
1,...,r}, so that O < k; < ni, for any i = 1,..., r, X ;iki = M, and 
ni +... +n, = Ni + Nd. 


Problem 1.2.13. Prove that 
(xi +... +x)" = Cutis n) eae” 5 


the summation being taken over all possible choices for the integers n1,..., np, in 
such a way that n; > 0, for any i = 1,...,r, and Yai ni =N; 


Problem 1.2.14. Prove that the number of nondecreasing paths on the integer 
lattice VAR = {(i1,..., i): i1, ..., ip = 0,1,2,...} that start at the origin (0, ...,0) 
and end at some point (n1,...,n,) with a ni = N, equals Cy(m,...,7,). 
(A path on the lattice Z’, is said to be nondecreasing if at every step only one of the 
coordinates changes by +1.) 


Problem 1.2.15. Consider the sets A and B, chosen so that the numbers of their 
elements, resp. N = |A| and M = |B|, are both finite, and: 

let F: A +> B denote any function from A to B, i.e., any rule that assigns a unique 
b € B to anya € A (one and the same b € B can be assigned to many a € A); 

let I: A +» B denote any injection of A into B, i.e., any rule that assigns to 
different elements of A different elements of B, so that no two elements of A are 
assigned one and the same element from B (for this to be possible one must have 
|A| < |B); 

let $: A > B denote any surjection of A into B, i.e., any function from A into 
B with the property that for every b € B there is at least one a € A with S(a) = b 
(for this to be possible one must have |A| > |B|); 

and, finally, let B: A —> B denote any bijection from A into B, i.e., any function 
from A into B which is both surjection and injection (for this to be possible one 
must have |A| = | B|); 


1.2 Some Classical Models and Distributions 21 


Prove that the total number of: all functions from A into B, of all injections of A 
into B, of all surjections from A onto B, and of all bijections between A and B, are 
given, respectively, by: 


N(F)=M%, NŒ) =(M)x, N(S)=M!S¥, and N(B)=N!. 


Problem 1.2.16. Prove that the numbers Py = EAN), N > 0, with (N)o= 1 
and (N), = N(N — 1)... (N —n + 1), satisfy the following recursive relation: 


Py = NPy-; +1, N>1. 


In addition, prove that 


and that Py is the nearest integer to eN!. 


Problem 1.2.17. Prove that the exponential generating function 
Ep(x) = } Pw ap 
N=0 


associated with the sequence P = (Py)w>0o, which is defined in the previous 
problem, satisfies the relation 


Ep(x) = in 


Problem 1.2.18. An urn contains M balls labeled 1,2,..., M. Each ball is painted 
in either red or blue. Let M, denote the number of red balls in the urn and let M> 
denote the number of blue balls in the urn (Mı + M2 = M). Consider an unordered 
sample from the urn without a replacement of size n = nj +n2 < M and let Bnin 
denote the random event that there are exactly nı red and nz blue balls in the sample. 
Suppose that M — co, Mı — œ and M —> œ in such a way that, for some finite 
number 0 < p < 1, one has M/M, —> p and M/M) —> 1 — p. Prove that 


P(Bry m) > C? pd — p)”. 


nytn2 


Hint. Use the identity 


cm M” M” 
P(Bnin) = A} a . 
R Mn 
Problem 1.2.19. Prove that in the multinomial distribution {P(A,,.___,,.)} the prob- 
ability {P(A,,..»,)} is the largest when the list (k1,...,k,) is chosen so that 


npi—1 < ki; <(n+r—1)p;,i =1,...,r. 
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Problem 1.2.20. (One-dimensional Ising model.) There are n particles placed at 
locations 1,...,n. Each particle is either type-1 particle or type-2 particle. The 
total number of type-1 particles is nı, the total number of type-2 particles is n2 
(nı +n = n) and all n! placements of the particles are equally likely. 

Describe the associated probabilistic model and compute the probability of the 
event A, (M11, M12, M21, M22) = {v = Mı1,..., V22 = M2}, where v;; denotes 
the total number of type-i particles that are placed immediately after a type j 
particle (i, 7 = 1,2). 


Problem 1.2.21. Suppose that one must estimate the size N of a certain population 
and that the estimation effort must be “minimal”; in particular, straight counting of 
all individuals in the population cannot be used as a method. Such problems are of 
interest when one must estimate, for example, the total number of citizens in a given 
country, large city, etc. 

In 1786 Pierre-Simon Laplace proposed the following method for estimating the 
total number N of all French citizens: 

Take some number, say, M, of French citizens and record their names. Then 
return those citizens back in the general population so that they are “perfectly 
mixed” with unrecorded individuals. Then choose a “perfectly random” sample 
of n individuals and denote by X the total number of recorded individuals in that 
sample. 

(a) Given some fixed N, M and n, prove that the probability Py.v:n{X = m}, 
i.e., the probability that the number of recorded individuals in the sample is exactly 
equal to m, is given by the formula for the hyper-geometric distribution (see [ P §1.2, 
(4)): n n—=m 

Py mintX =m} = Cu Cyne 
; cr 


(b) For some fixed M , n and m find the maximum of Py w:,{X = m} for various 
choices of N. If N denotes the value for N at which that maximum is achieved, 
i.e., if Ñ is the “most likely” size of the entire population, given that the number 
of recordeded individuals in the sample is m (this is also known as the maximim 
likelihood estimate of N ), prove that 


: a 
N=|— |], 
m 


where |-| is the “integer part” function. (This problem continues in Problem 1.7.4.) 


Problem 1.2.22. In the (elementary) combinatorial theory the binomial coefficients 
Cu = On = a a (denoted equivalently by ee )) and the number of ordered 
samples (M), = M(M —1)...(M —n+ 1) are defined usually for integer numbers 
n,M € N = {1,2,...}. In some areas of analysis it is often useful to define “the 
number of ordered samples (M),,” and “the binomial coefficient Cj, 2” with M 
replaced by some arbitrary X e R. Assuming that n € {0,+1,+2,...} define 


0! = 1, (X)o = 1, CY = 1, and the define 
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(X) = X(X -1)...(X -n+ 1), cg = Č, 


for any n > 0, 


and Cy = 0, for any n < 0. In conjunction with the above definitions (and 
some of the relations established in Problem 1.2.2) prove the following identities 
for arbitrary X, Y € R andn € Z = {0, +1, +2,...}): 


Cy! +C =C}; (Pascal triangle property); 


Cy = -Dci gi~ =k C s binomial ). 


convolution 


CN A =e 1)"" sf 


n-m 


n=X = xe DEC CM n—k > 


k=0 


n 

n — n—k k š 

Cyt ytn-1 E > Cyn —k-1 Cy 4x1 , 
k=0 


Cry = (=1)" Cy nai i 


Problem 1.2.23. Consider ordered samples without repetition of size M, taken 
from an urn that has N > 2 balls, of which n > 2 are white and N —n are black. Let 
A; j be the event that the i ‘h and the j" balls in the sample are white, i < j < M, 
and let A; jẹ be the event that the i", the j™ and the k™ balls in the sample are 
white, i < j < k < M. Compute the probability of the events A; j and A; jx- 


Problem 1.2.24. Find a formula for the probability P,, of having n spades in a hand 
of 13 cards, taken at random from a full deck of 52 playing cards. 


Problem 1.2.25. Consider n > 3 different points on a circle and suppose that 2 of 
these points are chosen at random. What is the probability that these two points are 
“neighbors”? 


Problem 1.2.26. (“The married couples problem,” a.k.a. “problème des ménage.” ) 
In how many different ways can n married couples (n > 3) be seated at a round table 
in such a manner that men and women alternate, i.e., there are no two men or two 
women sitting next to each other, and, at the same time, there are no husband and 
wife sitting next to each other? 

Hint. Suppose that the seats around the table are labeled (say, clockwise) 
1,...,2n, and that seat 1 is always occupied by a woman. Given some 1 < k < 2n, 
let A; denote the event that seats k and k + 1 are occupied by some married couple, 
with the understanding that seat 2n + 1 is identified with seat 1. Then the event 
that there are no husband and wife sitting next to each other can be expressed as 
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eo Ak. By the inclusion—exclusion formula (see Part (b) of Problem 1.1.12) one 
can write 


2n 2n 


P((%)=1-P(U a) =1- Deans Pa NA;)—.. 


i<j 


A straight-forward calculation shows that, for any 1 <7 < 2n, one has 


P(A;) = (= zi 


forany | <i < j < 2n one has 


2 
n(n— (5%) , ifi—j| 41, 
0, if |i —j| = 1, 


P(A; Nn Aj) = 


where P(A; N An) = 0, and, in general, for any i) < ... < ix one has 


n! (n—k)! 2 tp |e 5 k k 
E ml ) » if lij41—ij| 2 2forl <k <k, 
P(A N.. NAg) = and 2n + i; — ik > 2, 


(0) in all other cases. 


Consequently, 
2n 


(Ba) ereza 
k=1 


where dk denotes the number of all possible choices of k non-intersecting pairs of 
neighboring seats (the pairs (i, i + 1) and (j, j + 1) are said to be non-intersecting 
if, either? + 1 < j, or j + 1 < i). After showing that 

2n 
Ci k on =k já 
one arrives at the following conclusion: the probability that no married couple is 
seated on two neighboring seats is given by 


dě =C 


= LC D(a = k)! zy Cht: 


Problem 1.2.27. (Latin squares.) A Latin square of size n x n is simply a square 
matrix of size n x n which is filled with the numbers 1,2,...,n in such a way that 
each of these numbers appears precisely once in every column and precisely once 
in every row. For example, 
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12 21 
21 and 12 


are Latin squares of size 2 x 2, while 


123 123 
231 and 312 
312 231 


are Latin squares of size 3 x 3. If L, stands for the total number of all Latin squares 
of size n x n, prove that 


L, > n!(n—1)!...1! (= [1#). 


Remark. One can show, for example, that Lə = 2, L3 = 12, L4 = 576, etc.; 
however, an exact general formula for L,, is rather difficult to obtain. Nevertheless, 
the following asymptotic result is well known: 


ln L, =n? lnn + O(n’), as n — oœ. 


Problem 1.2.28. (G. Pólya’s urn scheme.) Suppose that an urn contains r red and 
b black balls and consider the following trial: one ball is drawn at “random” from 
the urn, after which that ball and a new ball of the same color are placed back into 
the urn. Suppose that this trial is repeated many times and let S,, denote the number 
of red balls that have been drawn from the urn during the first n trials. Prove that 


Cc’! cb-1 
x-l —x— 
P{S, = x} = r+x b+n—x N 0<x<n. 


n 
Ci tb+n-1 


Problem 1.2.29. In the context of Pólya’s urn scheme described in the previous 
problem, set 
r b 1 
= y= f 
r+b r+b 
and suppose that when n — oo one has p — 0 and y — (0 in such a way that 
np — à and ny — 1/p. Prove that for any fixed x one has 


= 


PLS, = x} > Chp+x-1 TPE as n —> oo. 


Problem 1.2.30. Consider the random placement of 2n balls, of which n are white 
and n are black, into m boxes, labeled 1,...,m. The probability for a black ball to 
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be placed in the j" box is p j (Pı +++++ Pm = 1) and the probability for a white 
ball to be placed in the j" box is qj (Qit-::+4m = 1). Let v denote the number of 
boxes that contain exactly one white and one black balls. Calculate the probability 
P{v = k}, k =0,1,...,m, and the expected value Ev. 


Problem 1.2.31. (On Stirling’s formula—see also Problem 1.3.16 and Problem 
8.8.1.) By the well known asymptotic series expansion for the gamma function, 


one has 
nyn 1 1 139 1 
!= Jan e = o 
á a (5) ( ta + 8am 514003 * (5) 


By using the relations 


Inn! =) Ink and ma-D!< f Intdt <Inn!, 
k=2 ! 


in which f i Int dt = nlnn —n + 1, derive the following (rough) lower and upper 


bounds for n! : 
nvn nvn 
e(-) <ni< en(=) , (*) 
e e 


which leads to Stirling ’s formula: 
nN” 
n! ~ v2rn (=) f 
e 


Problem 1.2.32. (On the asymptotic decomposition of harmonic numbers.) A 


harmonic number is a number of the form H, = Xa E, n > 1. From the well 


known asymptotic expansion of the digamma function, one has 


1 1 1 1 
H, =1 = o , 
an+ y+ oe In 20n (a 


where y = 0.5772... is the Euler constant (a.k.a. the Euler-Mascheroni constant). 
By using the method developed in the previous problem, for estimating certain 
sums in terms of integrals, prove that for any n > 1 one has 


1 
Inn+—< H, <Inn+1, 
n 


and conclude that lim, (H,,/Inn) = 1. 
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Problem 1.3.1. Prove by way of example that, in general, the following identities 
do not hold: 


P(B| A) +P(B| A) =1, 

P(B| A) +P(B| A) =1. 
Problem 1.3.2. An urn contains M balls of which M, are white. Consider a 
random sample of size n and let B; denote the event that the ball taken at the 
j drawing in the sample is white. Let A, denote the event that there are exactly 


k white balls in the entire sample of size n. Prove that, regardless of whether the 
sampling is with replacement or without replacement, one must have 


P(B; | Ax) =k/n. 


Hint. Prove that in the case of sampling with replacement one must have 


ck 1 Mi (M — M)” 


P(B; N Ay) = == - 
k k n—k 
MĚ(M -M 
P(A) = ois - 0 ; 


while in the case of sampling without replacement one must have 


CHI (M1) (M — Mi)n—k 


P(B; N Ak) = (M) : 
k — 
P(Ax) = C, Coe Mi)n—k 


Problem 1.3.3. Let A),..., An be independent events with P(A;) = pi. 
(a) Prove that 


(U4) =1-[] PA). (*) 


i=l i=l 


(b) Let Po be the probability that none of the events A1, ..., An occurs. Prove 


that 
=| [d->pi). 
i=l 
Hint. Give a direct proof of the identity in (*), i.e., a proof that makes no 
use of the inclusion—exclusion formula (see Problem 1.1.12), by showing that if 
Ai.. , A, are independent events, then any events of the form Ais. ; An, where 
A; is {aken to be either A;, or A; , are also independent. 
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Problem 1.3.4. Suppose that the events A and B are independent. Calculate the 
probabilities that exactly k, at least k and at most k of the events A and B occur, 
k = 0,1,2—comp. with Problem 1.1.13. 


Problem 1.3.5. Suppose that the event A is such that it is independent from itself; in 
other words, one can claim that the events A and A are independent. Prove that P(A) 
equals either 0 or 1. In addition, prove that if the events A and B are independent 
and A C B, then either P(A) = 0, or P(B) = 1. 


Problem 1.3.6. Suppose that the event A is such that either P(A) = 1 or P(A) = 0. 
Prove that, given any event B, one can claim that A and B are independent events. 


Problem 1.3.7. Consider the electric circuit from [P §1.4, Fig. 4]. Each of the 
relays A,B,C, D and E function independently, and can be either off (i.e., not 
allow electric current to pass through), or on (i.e., allow electric current to pass 
through), respectively, with probabilities p and q. What is the probability for a signal 
submitted at the input to eventually get transmitted through the circuit all the way 
to the output? What is the conditional probability for the relay E to have been on, 
given that the signal has been transmitted through the circuit and has reached the 
output? 

Hint. (a) Let S denote the event that the signal submitted at the input has been 
received at the output. Then 


P(S|E)=1-2p’+ p*, P(S|E) = 29-4", 
and, according to the total probability formula, one has 
P(S) = q(1— p*)’ + pq?(2-q’), 
while the Bayes formula implies that 
=p) 
a-p}? + p-a?) 
Problem 1.3.8. Suppose that P(A + B) > 0. Prove that 


P(A) 
P(A) + P(B) ° 


P(E|S)= 


P(A|A+ B)= 


Problem 1.3.9. Suppose that the event A is independent from each of the 
events B,, n > 1, chosen so that B; N B; = Ø, i A j. Argue that the events A and 
UZ | Bn are independent. 


Problem 1.3.10. Prove that if P(A|C) > P(B|C) and P(A|C) > P(B|C), 
then P(A) > P(B). 


Problem 1.3.11. Prove that 


P(A| B) = P(A| BC)P(C | B) + P(A| BC) P(C |B). 
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Problem 1.3.12. Suppose that X and Y are two independent binomial random 
variables with parameters n and p. Prove that 


k œm—k 
C, C, 


P(X =k|X +Y =m)=—2 = 
Con 


, forany k = 0,1,... , min(m, n). 


Problem 1.3.13. Suppose that A, B and C are pair-wise independent events, with 
AN BNC = Ø. Find the largest possible value for P(A). 


Problem 1.3.14. Consider an urn which already contains one white ball. One 
randomly chosen ball—either white or black, with equal probability—is added to 
the urn, after which one ball is taken from the urn at random. Assuming that this last 
ball happens to be white, what is the probability that the ball left in the urn is also 
white? 


Problem 1.3.15. If the events A and B are independent, then, just by definition, 
one has P(AB) = P(A)P(B). What conditions for A and B would gurantee that 
P(AB) < P(A)P(B), or that P(AB) > P(A)P(B)? 


Problem 1.3.16. In conjunction with the generalization of Stirling’s formula (in 
the form n! ~ /2mnn"e", n — oo), prove that the gamma-function F (v) = 
SE u’—'e du, v > 0 has the property: 


T(v)~ Vv2anve, v> oœ. 


1.4 Random Variables and Their Characteristics 


Recall that in the present chapter the underlying sample space, £2, is assumed to be 
finite and, therefore, all random variables under consideration can take only finitely 
many values. 


Problem 1.4.1. Verify the following properties of the indicators [4 = I4(@): 


Ig=0, Ig=1, GR=1-I4, 
Iag = I4: Ig, Iaug = Ia + Ig — Iag, 
Ing = 14 —Is), Iag = (I4 — Ig} =14+1Ig (mod2), 


n n 


Iga a = 1- [[0- 1), Iz= [[@=44). Ipa =} a, 


i=l i=l i=l 


where AAB is the symmetric difference of the sets A and B, i.e., the set (A \ B) U 
(B \ A), and the summation symbol }> stands for union (J) of non-intersecting 
events. 
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Problem 1.4.2. Conclude from the statement in Problem 1.4.1 that the following 
“inclusion—exclusion” formula for the indicators of the events A, B and C is in 
force: 


Iauguc = 14 + Ig + Ic — Wane + Lanc + Ipnc] + Lanenc. 


Find the analogous representation for the indicator [4,u..Uu4, of the union of 
Aj,...,An. 


Problem 1.4.3. Suppose that &,...,&, are Bernoulli random variables with 


P{&; = Of} = 1- 4;4, 
PE =D =AA, 


for some small number A > 0, and for some choice of A; > 0. Prove that 


i=l 


Problem 1.4.4. Prove that inf-oo<a<oo E(E — a)? is achieved with a = E£ and that, 
consequently, 
inf E(é—a)’ = DE. 
—00<a<0o0o 
Hint. Assuming that E£ = 0, prove that E(é — a)? = Dé + a? > DE. 
Problem 1.4.5. Let £ be any random variable with distribution function F(x) = 
P{& < x} and with median u = w(€) = w(F¢), defined as the only yz € R with 


1 
F(u- < 5 < Fe). 


(For an alternative definitions of the notion of median see Problem 1.4.23 below.) 
Prove that 
inf El§—a| = Elé — pl. 


—00<a<0o 


Hint. Assuming that u = 0, prove that for a > 0 one has 


Elf — a| = Elg| + Ef), 


where 
a, x <0, 
T(x) = 4a-—2x, 0<x <a, 


—a, x>a. 
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Since f(x) > 0, we have E f (£) > 0 and EJE — a| > Eé|. Analogous statement 
can be made for the case a < 0. 


Problem 1.4.6. Let P(x) = P{& = x} and F(x) = P{& < x}. Prove that for 


a > Oand —oo < b < œ one has 


Pag+o (x) = P-(= a 


a 


Fag+ (xX) = F (7 — s 


a 


In addition, prove that for y > 0 one has 


Fa (y) = Fel VY) — Fe(—J/y) + Pe(-V/Y) 


and, with €* = max(¢, 0), one has 


0, x <0, 


F;+ (x) = o 


V 
i) 


Hint. Use the following relations: 


x—b 
a 


a 


lett b= a= fee l, aptos hes P, 


E <y} ={E=—-VV} UKE S EDINE S V), 


ØD, x<0, 


tcy = 
oe {<x} x20. 


Problem 1.4.7. Let € and 7 be any two random variables with Dé > 0 and Dn > 0, 
and let o = p(&, 7) denote the correlation between & and 7. Prove that |o| < 1. In 
addition, prove that |o| = 1 implies that there are constants a and b, for which one 
can write n = a€ + b. Furthermore, if pọ = 1, then 


n= En _ E= E 


Jor bE 


(so that p = 1 implies that a > 0) and, if ọ = —1, then 


n-En __ -E£ 


Jor J 


(so that p = —1 implies a < 0). 
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Problem 1.4.8. Suppose that £ and 7 are two random variables with EE = En = 0 
and DE = Dn = 1 and with correlation coefficient p = p(&, n). Prove that 


E max(é?, 9?) < 1+ y1- e. 


Hint. Use the identity 
1 
max(§, n?) = 5(& +n? + | — nD 


and the Cauchy—Bunyakovski inequality. 


Problem 1.4.9. By using the property Ja; = 1 —[]j2:(. — Z4), associated 
with the indicators from Problem 1.4.1, verify the following “inclusion—exclusion” 
formula: 


P(4,U...U An) = D> P(A)= SD P(A, NAD) +... + 


1<ij<n 1<i, <iz<n 
+1") SO P(A... A 
1<i| <...<iy<n 


(1 Pi... N An) 


jes 


Im 


(comp. with Problem 1.1.12). 
Hint. With the substitution X; = J,4,, prove first that the following “inclusion— 
exclusion” formula for indicators is in force: 


1-[Ja-xX)= >) Xx- JO XAX_+...+ 
i=1 l<i<n 1<i) <i2<n 


+I" SX Xin Feo FCI. Xn. 


1<i| <...<iy<n 


After that use the relation P( bead, a) = Elyr_,4, (comp. with the hint in 
Problem 1.1.13). 

Problem 1.4.10. Suppose that &,..., En are independent random variables and that 
pı = Qı (&,...,&) and @ = g2(&41,...,§) are any two random variables that 


can be written as functions, respectively, of &,...,& and &+1,...,§,. Prove that 
gı and 2 are independent. 


Problem 1.4.11. Prove that the random variables &|,..., E, are independent if and 
only if for every choice of the real numbers x,,..., Xn one has 


Fa ees En Xis. Xn) = Fe, (41)... Fg, On), 


where Fg e, (X1; -03 Xn) = P{& < x1,...,8) < Xn}. 
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Problem 1.4.12. Prove that the random variable € is independent from itself, i.e., 
one can claim that £ and € are independent, if and only if €(w) = const, w € Q. 


Problem 1.4.13. Under what condition for the random variable € can one claim 
that £ and (sin £) independent? 


Problem 1.4.14. Suppose that £ and 7 are independent random variables and that 
n Æ 0. Find expressions for P{En < z} and Pi < z} in terms of the probabilities 
P{é < x} and Pi) < y}. 


Problem 1.4.15. Suppose that the random variables £, 7 and ¢ are such that |E| < 1, 
In| < 1, |¢| < 1. Prove Bell’s inequality: |EE¢ — Ené| < 1 — E&n (see [62], for 
example). 

Hint. Use the inequality §(1 + n) < 1+ n. 


Problem 1.4.16. One throws, one-by-one and at random, k balls in n boxes (the 
probability that a given ball would fall in a given box is 1/n). Find the expected 
number of the non-empty boxes. 


Problem 1.4.17. Suppose that &|,...,§, are independent and identically dis- 
tributed random variables with P{é; = 1} = p and P{é, = 0} = 1 — p, for 
some 0 < p < 1, and let Sp = & +... + &,k < n. Prove that, for 1 < m <n, 
one has 


Cn Crom 
P(S, = k| Sn = 1) = Ta 
Problem 1.4.18. Suppose that &,..., En are independent random variables and let 


Emin = min(€,...,§)) and Emax = max(&,...,&,). 


Prove that 


P{Emin > x} = [ [PE =x} and P{Emax < x} = | [PLE < x}. 


i=1 i=1 


Problem 1.4.19. Let Sx, = & +...+&,, and set Mən = max(S},..., S2,). Prove 
that, for any k < n, one must have 


P{ Mon = k, Son = 0} = P{ San = 2k} 


and that, therefore, 


P{S n = 2k crts 
eee an nes ~ = : = 
ars 2n 
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Conclude from the last relation that 


1 1 
E(M3,, | So, = 0) = —1]. 
(Mon | Soy = 0) eer | 


Problem 1.4.20. Give an example of two random variables, € and 7, that share the 
same distribution function (F; = F,,) and have the property P{E Æ n} > 0. 


Problem 1.4.21. Suppose that £, 7 and ¢ are random variables, chosen so that 
the distribution functions of € and ņ coincide. Can one claim that the distribution 
functions of £¢ and 7 also coincide? 


Problem 1.4.22. Give an example of two independent random variables, € and n, 
for which €? and 7 are dependent. 


Problem 1.4.23. Suppose that £ is some discrete random variable. Consider the 
following three definitions of the median, u = u (€E), of Ẹ (see Problem 1.4.5): 


(a) max(P{& > u}, PLE < u3) < 1/2; 
(b) P{E < u3 < 1/2 < PiE < u}; 
(c) u = inf{x e€ R : P{E < x} > 1/2}. 


Let Ma, Mp and M, denote the sets of “medians” associated with definitions (a), 
(b) and (c), respectively. How do these three sets relate to each other? 


Problem 1.4.24. A urn contains N balls, of which a are white, b are black and c 
are red, a + b + c = N. Suppose that n balls are taken from the urn and suppose 
that among those n balls there are € white balls and 7 red balls. Prove that: if the 
balls are sampled with replacement, one has 


cov(é,) = =n pq, 


where p = a/N andq = b/N, and if the balls are sampled without replacement, 
one has 


v(é.n) N-n 
cov(é,7) = —n i 
n Pq NLI 


Finally, prove that in both cases the correlation is given by 


— Pa 
MED == =a 


1.5 Bernoulli Scheme I: The Law of Large Numbers 


Problem 1.5.1. Suppose that £ and 7 are two random variables with correlation co- 
efficient p. Verify the following two-dimensional analog of Chebyshev’s Inequality: 


P{lë— E$] = VDE or |n— En] > ev/Di} < 40 + VIZ. 
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Hint. Without a loss of generality suppose that EE = Ey = 0 and Dé = Dy = 
1, in which case P{|€| > eor|n| > e} = P{max(&?, n?) > e7}. Then use the 
(“usual”) Chebyshev inequality and the inequality established in Problem 1.4.8. 


Problem 1.5.2. Suppose that f = f(x) is some non-negative function which is 


even and is also non-decreasing for positive x. Given any random variable € = 
E(w), with |E(w)| < C, C > 0, verify the following estimate: 


E = 
Piz SA | 


2 


In particular, for f(x) = x^ one must have 


EE? — ¢? 


Dé 
aa < PIE — E£] = e) (< =): 


Hint. Use the following relation 


ESE) = EFE) < F(C)PLEl = e} + f(e). 


Problem 1.5.3. Let &,...,§&, be any sequence of independent random variables 
with Dé; < C. Prove that 


p ees Eli +... + én) >e) ae. 

n n T )= ng 
(With the conventions adopted in [ P §1.5, (8)], the above inequality gives a version 
of the law of large numbers, which is more general than the version obtained in the 
context of the Bernoulli scheme.) 


Problem 1.5.4. Suppose that &,..., En are independent Bernoulli random vari- 
ables with P{g& = 1} = p > 0 and P{g = —1} = 1 — p. Verify Bernstein’s 
estimate: there is some a > 0, for which 


Sn = 
P{|— =(Ip= D| > e} < 2e™n 
n 


where S, = & +...+&, and e > 0. 
Hint. See the proof of [ P §1.6, (42)]. 


Problem 1.5.5. Let € be any non-negative random variable and let a > 0. Find the 
maximal possible value for the probability P{€ > a} in each of the following three 
cases (m and o are given real numbers): 


G) E =m; 
Gi) Ef = m, DE = 0°; 
Gii) EE = m, DE = o? and & is symmetric relative to its mean value m. 
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Problem 1.5.6. Let So = 0 and S, = & +... + En n < N, where &,...,§&, isa 
Bernoulli sequence of independent random variables, with P{é, = 1} = p > 0 and 
Pfé, = 0} = q,n < N, and let P,(k) = P{S, = k}. Prove that, for n < N and 
k > 1, one has 

Priilk) =p P, (k — D) +4 P, (k). 


Problem 1.5.7. Suppose that &1,..., €y are independent Bernoulli random vari- 
ables, with P{&; = 1} = P{& = —1} = 1/2,i = 1,...,N, and let Sm = 
Ei +... + Em. Prove that for 2m < N one has 


P{S)... Sim #0} = Q-2mom 


2m ` 


Problem 1.5.8. Consider M cells, labeled 1,..., M. Suppose that the cell with 
label n contains one white ball and n black balls. Consider a random sample of balls 
from the M cells, let 


&, = 1, if a white ball is drawn from the cell with label n, 
"10, if.ablack ball is drawn, 


and let Sy = & + ... + Em denote the total number of white balls in the sample. 
Prove that for large M the quantity Sj “has order” In M, in the sense that, for any 
£ > 0, one has, as M > œ, 

P| 


(with the convention adopted in formula [P §1.5, (8)]). 


Sm 
ln M 


=l 


>e} +0 


Problem 1.5.9. Suppose that &,...,&, are some independent Bernoulli random 
variables, with P{& = 1} = p and P{& = 0} = 1 — px, 1 < k < n, and let 
a= 1 Yii pk. Prove that, for any fixed 0 < a < 1, the variance, DS,,, of the 
variable S, = & + ... + Én attains its maximal value when pj =... = Pn =a. 
Problem 1.5.10. Suppose that £,...,&, are some independent Bernoulli random 
variables, with P{f& = 1} = p and P{& = 0} = 1-p,1<k<un. 
Find the conditional probability that the first 1 (“success”) appears in the m® step, 
conditioned to the event that in all n steps “success” occurs exactly once. 


Problem 1.5.11. Let (pı,..., p-) and (q1, ...,q,) be any two probability distribu- 
tions. Prove the Gibbs inequality: 


— > piln pi < ~ Yo piling. 
i=l i=l 


In particular, the entropy H = — )~;_, pi In p; must satisfy the relation H < Inr— 
see [P §1.5, 4]. 


1.6 Bernoulli Scheme II: Limit Theorems (Local, Moivre—Laplace, Poisson) ay 


Problem 1.5.12. In the context of Problem 1.5.10, prove the Rényi inequality: 


NE 
> | < exp | = 2pq( + £€/Ê2po}\ 


2 


1.6 Bernoulli Scheme II: Limit Theorems 
(Local, Moivre—Laplace, Poisson) 


Problem 1.6.1. Letn = 100 and consider the choices p = 1/10, 2/10, 3/10, 4/10, 
5/10. By using the relevant tables for the binomial and the Poisson distributions 
(see [12], for example), or by using a computer, compare the exact values of the 
following probabilities: 


P{10 < Sio < 12}, P{20 < Sio < 22}, (1.1) 
P{33 < Stoo < 35}, P{40 < Sioo < 42}, (1.2) 
P{50 < Sio < 52}, (1.3) 


with the respective values obtained by the normal and the the Poisson 
approximations. 


Problem 1.6.2. Let p = 1/2 and let Z, = 2S, — n (the aggregate excess of 1’s 
vs. 0’s in n trials, the outcome from each trial being 0 or 1). Prove that 


sup| vrn P{Zo, = joe Fl" | >0, asn> œ. 
J 


Hint. Setting 2n = m and k = j/2 + n, the proof comes down to showing that 


a a 
(= Pi Si: = k} = eo mm 


sup ( = sup e(k, m)) > 0, am>w. 
k k 
With this relation in mind, one must prove that 


sup e(k, m) = max(am, bm), 
k 


where 


i= sup e(k,m), bn = sup e(k,m), 
{k:|k—mp|<(mpq)°} {k:|k—mp|>(mpq)} 


for some s € (1/2,2/3), and then verify that am —> 0 and bm —> 0 as m — oo. 
Problem 1.6.3. Prove that in the Poisson theorem (with p = A/n, A > 0) one has 
keh 22 

k! IT n’ 


sup| P, (k) — 
k 
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Hint. Let nj,..., Nn and ¢),...,¢, be two different sets of independent random 
variables, distributed, respectively, according to Poisson’s law with parameter A/n 
and Bernoulli’s law with 


Pié; =O} =e/"(1—A/n) and Pti =1}=1-e4/"(1—-A/n). 


Setting 
& = 0, ifn = 0, ¢; = 0, 
' 1 in all other cases, 
notice that &,..., &, are independent Bernoulli random variables with 


À À 
P{& =0}=1--, P =l =, 
n n 
and that the distribution of E = £; + ... + &, is given by P{§ = k} = P,,(k). Then 
take into account that 7 = 7; +...-+ Nn is distributed according to the Poisson law 
with parameter A, and that, given any k = 0, 1,2, ..., one has 


A2 
IP{E = k} — Pin = k}| < PIE Æ m S 


(Comp. with the results and the proofs in [ P §3.12].) 


Problem 1.6.4. Let &,...,&, be independent and identically distributed random 
variables with P{& = 1} = P{& = —1} = 1/2 (this is a symmetric Bernoulli 
scheme), let S, = & +... + én, and let P,(k) = P{S, = k} fork € 
E, = {0,+1,..., +}. By using the total probability formula (see [P §1.3, (3)]), 
verify the following recursive relation (a special case of the Kolmogorov—Chapman 
equation—see [P §1.12]): 


1 1 
Pavi (k) = 5 Palk +1) + 5 Pik- 1), k € En41, (*) 


which is equivalent to 
1 
Pa+1(k) = P, (k) = 2 [P, (k a 1) — 2P,(k) alr Py(k = 1)] R (**) 


Problem 1.6.5. (Continuation of Problem 1.6.4.) The sequence of random vari- 
ables So = 0, Sı = &, Sp = & + &,..., Sn = & +... + En, may be identified 
with the trajectory of a random walk of a particle that starts from 0 and moves one 
unit up or down at integer times. 

Suppose now that the up and down moves in the random walk occur only at times 
A,2A,...,nA, for some A > 0, and that the particle move up or down at distance 
Ax. Instead of the probabilities P,(k) = P{S, = k}, introduced in the previous 
problem, consider the probabilities 
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Pya(kAx) = P{Sna = kAx}. 


Analogously to the recursive relation (**), we find that 


Parsiya(kAx) = P,a (k4Ax) a 1 
- = 5| Pra ((k + Ax)- 


= 2Pra(kAx) oR Pra((k ig Ax], 


i.e., the (discrete) “first derivative” in the time-parameter coincides up to a factor 
of 5 with the (discrete) “second derivative” in the space variable. 

With Ax = VA, t > 0, x € R, consider the special limiting procedure with 
n — œ and k — œ, taken so that nA — t and kVA > x, and prove that for this 
procedure one can claim that 

(a) the limit P; (x) = lim P,a (k VA) exists, and, 

(b) as a function of t, satisfies the heat equation, namely, 


OP;(x) _ 1 P(x) 
ot 2 dx? 


(L. Bachelier, A. Einstein). 


Problem 1.6.6. To generalize the result in the previous problem, suppose that the 
particle moves up at distance Ax with probability p(A) = 5 +Ax, and moves down 


at distance Ax with probability g(A) = : — Ax. Again set Ax = JA and suppose 


that nA — t and kVA —> x. Prove that, just as in the previous problem, one can 
claim that the limit P, (x) = lim P,a(k V/A) exists and satisfies the equation 


OP(x) _ OP, (x) 4 1 3P, (x) 
ðt ax 2 dx? 


Problem 1.6.7. What should be changed in the limiting procedures in the last two 
problems, in order to claim that the function obtained in the limit satisfies the 
equation 

OP; (x) ƏP, (x) 1 , BPR) 


= o ; 
ot Pox 2 ax? 
known as the Fokker—Planck equation, or Kolmogorov forward equation. 


Problem 1.6.8. Suppose that F, = F,,(t), t € [0,1], n > 1, is some sequence of 
nondecreasing functions, with the property F,,(t) — t, for all rationalt € QN(0, 1]. 
Prove that this convergence must be uniform, i.e., 


sup |F,(t)-—t] > 0 as n— oœ 
te[0,1] 


(see also [ P §3.1, (5)]). 
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Problem 1.6.9. Prove that, given any x > 0, one has 


p(x) 
—— px) <1- 80) < a 
where g(x) = ee and (x) = jan y(y) dy. 
Hint. Take the derivatives g'(x) and (x~!g(x))’. 


Problem 1.6.10. Prove that the Poisson distribution satisfies the following local 
theorem: given any k = 0,1,2,..., as 4 —> oo one has 


VÀ a e^ : 
k! J 20Xr 


Hint. Use Stirling’s formula. 


exp { — kat >0 


1.7 Estimate of the Probability for Success in Bernoulli Trials 


Problem 1.7.1. A priori, it is known that the parameter 0 takes values in the set 
Oo C [0, 1]. Explain when it might be possible to find an unbiased estimate for the 
parameter 0, that takes values only in the set Qo. 

Hint. If Qo is a singleton (O97 = {O}), then the value 4) must be the estimate 
itself. If Oo contains at least two points, then the following condition is necessary 
and sufficient for the existence of an unbiased estimate: {0} € @p and {1} € Op. 
Verify this claim. 


Problem 1.7.2. In the context of the previous problem, find an analog of the Rao- 
Cramér inequality and investigate the efficiency of the estimate. 


Problem 1.7.3. In the context of the first problem, investigate the construction of 
confidence intervals for 0. 


Problem 1.7.4. As a continuation of Problem 1.2.21, investigate whether the 
estimate N is unbiased and/or efficient, assuming that N is sufficiently large, 
N > M,N > n. Analogously to the confidence intervals for the parameter 0 
(see [P1 § 1.7, (8) and (9)]), construct confidence intervals [N — a(Ñ), N+ b(N)| 
for N with the property 


Prwsn {N —a(W) <N <Ñ +Â) xie, 


where £ is some small positive number. 


Problem 1.7.5. (7?—goodness-of-fit test). Suppose that £, .. . , En are independent 
Bernoulli random variables with P{&; = 1} = p and P{&; = 0} = 1—p,1 <i <n. 
Unlike the main discussion in [P §1.7], which is concerned with estimates of the 
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probability for “success,” p, here we are concerned with the problem of testing, 
based on the observations x = (x1, ..., Xn), of the hypothesis Hy: p = po, i.e., the 
hypothesis that the true value of the parameter p equals some given number 0 < 
Po < 1. Let S (E) = & +... + En and set 


SE = mp0)? 


2 = 
Xn (&) aa npo(l _ Po) 


Assuming that the hypothesis Hp is true, prove that, for any x > 0, one must have 


Osaf Tay ev? dy, asn—>oo. 


(According to [P §2.3, Table 3], F(x) = h a5 


distribution function of a y?-random variable a one on of freedom, i.e., the 
square of a standard (0, 1)-Gaussian random variable.) 

The y?-goodness-of-fit criterion for testing the hypothesis Ho: p = po is based 
on the following argument. Choose the number € > 0 so small that, in a single 
experiment, events that have probability £ are extremely unlikely to occur. (If € is, 
say, 0.01, then by the law of large numbers—see the remark related to formula (8) 
in [P §1.5]—an event that occurs in each trial with probability 0.01 will occur “on 
average” only once in 100 independent trials.) 

Next, consider € > 0 as fixed and choose A(e) so that ties zre e/? dy = e. 


One can now test the hypothesis Ho: p = po (by the x?—goodness-of-fit test) in 
the following manner: if the value y2(x), calculated from the observations x = 
(x1,...,%n), exceeds the quantity A(e), then Ho is rejected and if y?(x) < A(e) 
then Ho is accepted, i.e., one assumes that the observation x = (x),...,X,) is in 
agreement with the property p = po. 

(a) Based on the law of large numbers (see [ P §1.5]), argue that, at least for very 
large values of n, using the y?—goodness-of-fit criterion for testing Hp: p = po is 
quite natural. 

(b) By using the Berry—Esseen inequality [P §1.6, (24)], prove that, under the 
hypothesis Hp: p = po, one must have 


e? dy| < > 


| J2Ty ~ J/npo( = po) 


(c) Suppose that À„ (£) is chosen so that P{y?(€) > A,(e)} < e. Find the rate 
of convergence of A,,(¢) —> A(e) and, this way, determine the error resulting from 
replacing the event “y2(€) > A,(e)” with the event “y2(&) > A(e),” which is used 
in the y?—goodness-of-fit test. 


PIZ) <x}- 


sup 
x 


Problem 1.7.6. Let € be any binomial random variable with distribution 


Pott =k} =Ch Ok —0)"*, O<k<n, 
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where n is some given number and 0 is an “unknown parameter,’ which must be 
estimated by the (unique) observation over the random variable €. 

A standard estimator for 0 is given by the value T(€) = af This estimator is 
unbiased: given any @ € [0, 1], one has 


ET (E) = 6. 


Prove that, in the class of unbiased estimators T = T (€), the estimator T (€) is also 
efficient: a 
Ea (T(E) — 6)" = inf Eo (T E) — 8)”. 
T 


Argue that, for n = 3, if it is a priori known that 0 € G, +), then the estimator 
TO) = $, which is unbiased for every choice of 0 # 5, is “better” than the 
unbiased estimator T (£) = £. 


Eo [T E) — 0P < EosT E) — OF, 


1 2 E 2 
Es|=-0] <€[=-6] . 
"la “kg 
Investigate the validity of this statement for arbitrary n. 


Problem 1.7.7. Two correctors, A and B, are proof-reading a book. As a result, 
A detects a misprints and B detects b misprints, of which c misprints are detected 
by both A and B. Assuming that the two correctors work independently from each 
other, give a “reasonable” estimate of the number of misprints that have remained 
undetected. 

Hint. Based on a probabilistic argument, assuming that the number n of all 
missprints in the book is quite large, one can suppose that * and b are reasonably 
close to the probabilities p, and pp for a misprint to be detected, respectively, by 
corrector A and corrector B. 


1.8 Conditional Probabilities and Expectations with Respect 
to Partitions 


Problem 1.8.1. Give an example of two random variables, € and n, that are not 
independent and yet the relation 


EŒ |n) = E$ 


still holds (see [P $1.8, (22)]). 
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Problem 1.8.2. The conditional variance of the random variable € with respect to 
the partition 7 is defined as the random variable 


DE| 2) = ELE — EE | J)". 
Prove that the variance of £ satisfies the relation: 
DE = EDE | 2) + DE | 2). 
Hint. Convince yourself that 
EDE | 2) = E$’ —E[E(E| 2)? and DEE |2) = EJEG | A)’ — (E£) . 


Problem 1.8.3. Starting from the relation [P §1.8, (17)], prove that, given any 
function f = f(n), the conditional expectation E(& | n) has the property: 


ELF(MEE | M] = EEF] . 


Problem 1.8.4. Given two random variables, € and 7, prove that inf; E(ņn — 
f(E) is achieved with the function f*(£) = E(n|&). (This way, the optimal 
mean-square-error-minimizing estimator of n given & can be identified with the 
conditional expectation E(y | €).) 

Hint. Convince yourself that, for any function f = f(x), one has 


E(n— f(6))° = Eln- f EVEN- EO) S*O-FONMFES*O)-fOY 


where the expected value of the variable in box brackets E[-] actually vanishes. 


Problem 1.8.5. Let &,...,&, and t be independent random variables, such that 
&,,...,& are identically distributed and t takes its values in the set 1,...,. Prove 
that the sum of random number of random variables, namely S, := & +... + &, 
satisfies the relations 


E(S,|t) = tE&, D(S;|t) = tD& 
and 
ES, = Er-E&, DS, = Er- Dé, + Dr- (E&). 


Hint. Use the relations 
E(S,|t) = 7E& and D(S,|t) = tD€&. 


Problem 1.8.6. Suppose that the random variable € is independent from the partion 
2 (i.e., for any D; € 2, the random variables € and Ip, are independent). Prove 
that 

E(é|f) = E£. 
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Problem 1.8.7. Let & be some experiment, with associated space of possible 
outcomes 2 = {@1,...,@«}, the respective probabilities (i.e., “weights” for the 
outcomes) being given by p; = p(a;), vu pi = 1. It is established in [P §1.5, 
(14)] that the formula H = — D pi ln pi gives the entropy of the distribution 
(P1,---, pk), defined as a measure of the “uncertainty” in the experiment £. In the 
same section it is also shown that the uncertainty is maximal in experiments where 
all k outcomes are equally likely to occur, in which case one has H = Ink. 

The fact that, in the case where all outcomes are equally likely, the logarithmic 
function is a natural measure for the degree of uncertainty in the outcome of the 
experiment can be justified with the following argument, which is offered here as an 
exercise. 

Suppose that the degree of uncertainty in an experiment £, with k outcomes, is 
given by some function f (k), chosen so that f(1) = Oand f(k) > f()ifk > l. In 
addition, suppose that f (kl) = f(k)+ f (J). (This reflects the requirement that, for 
independent experiments, £; and é, respectively, with k outcomes and / outcomes, 
the degree of uncertainty in the experiment 4] ® &, which comes down to carrying 
out simultaneously £} and 4, must be the sum of the degrees of uncertainty in the 
two experiments.) 

Prove that under the above conditions f (k) must be of the form: f(k) = c log, k, 
where c > 0 is some constant and the logarithm log, k is taken with an arbitrary 
base b > 0. 


Remark. As the transition from one logarithmic base to another is given by 
log, k = log, a-log, k, it is clear that such a transition comes down to changing the 
unit in which the uncertainty is being measured. The most common choice is b = 2, 
which gives log, k = 1 fork = 2 and therefore allows one to identify the selected 
unit of uncertainty with the uncertainty in an experiment with two equally likely 
outcomes. In communication theory (and, in particular, in coding theory) such an 
unit of uncertainty is called bit of information, or simply bit, which originates from 
the term Binary digiT. For example, in an experiment & with k = 10 equally likely 
outcomes, the degree of uncertainty equals log, 10 ~ 3.3 bits of information. 


Problem 1.8.8. Let (2, .#,P) be any discrete probability space and suppose that 
E = Elw) w E Q, is any random variable that takes its values in the set 
{x1,...,Xx}, with respective probabilities P{§ = x;} = p;. The entropy of the 
random variable £ (or, equivalently, of the experiment é;, which comes down to 
observing the realization of £) is defined as 


k 
HÉ) =- > pi log, pi. 


i=l 


(Comp. with [P §1.5, (14)], where, instead of the binary logarithm log,, the natural 
logarithm In is used—as explained above, this choice is inessential.) 

Analogously, given a pair (€, n) of random variables with P{E = x;,n = yj} = 
Pij,t =1,...,k, j =1,...,1, the entropy H(E, n) is defined as 
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k 1 
HE, n) =— >> py log, Pij. 


i=l j=1 


Prove that if £ and 7 are independent, then H (é, n) = H(&) + H (n). 


Problem 1.8.9. Consider a pair (£, n) of random variables with values (x;, y;), 
i=1,...,k, j = 1,...,l. The conditional entropy of the random variable n, given 
the event {§ = x; }, is defined as 


l 
H,,(n) = — X Pin = yj |E = xi} log, Pn = yj | = xi}. 


j=l 


Then the mean conditional entropy of 7 given & is defined as 


k 
Hen) = X PAE = xj} Ay, (N). 


i=l 
Prove that: 


(a) HE, n) = HE) + He(n); 
(b) if £ and ņ are independent, then 


H(E, n) = H(é) + Ag(n); 


(c)0 < Ag(n) < H(n). 
Problem 1.8.10. For a pair of random variables, (&, 7), the quantity 


I(n) = H(n) — H:n) 


gives the amount of information for the variable 7 that is contained in the variable £. 
This terminology is justified by the fact that the difference H (n) — He (n) represents 
the amount by which observations over & decrease the uncertainty of 7, i.e., decrease 
the quantity H (n). 

Prove that: 

(a) (n) = I) > 0; 

(b) /¢(7) = H(n) if and only if n happens to be a function of £; 

(c) given any three random variables, £, 7 and ¢, one has 


Igan) = A(n) — Hean) = Iln), 


i.e., the information about 7 contained in observations over (£, ¢) cannot be less than 
the information about 7 contained in £ alone. 
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Problem 1.8.11. Let &,,...,&, be independent and identically distributed 
Bernoulli random variables, with P{&; = 1} = p and P{g = 0} = 1-— p, 
and let S, = & +... + En. Prove that 


Io (k 
(a) PG, =%1,....& = Xn | Sn = k) = vat J 
Te 
(b) P(S, =x | Sn+m = k) = -r i 
n+m 


where x = xı +... + Xn, x; = 0, 1, so that x < k. 


1.9 Random Walk I: Probability of Ruin and Time Until 
Ruin in Coin Tossing 


Problem 1.9.1. Verify the following generalization of [ P §1.9, (33) and (34)]: 
ES} =x +(p—qEr;, 


= 
Th 


E[S} — t E] = D Er + x’. 


Problem 1.9.2. Consider the quantities a(x), B(x) and m(x), which are defined 
in [P §1.9], and investigate the limiting behavior of these quantities as the level A 
decreases to —oo (A N —00). 

Hint. The answer is this: 


Fess Lo | aaa a 
Sa 
i Gee? PSA 


p-x 
p-q’ Peas 


OO, PS. 


li = 
aa | 
Problem 1.9.3. Consider a Bernoulli scheme with p = q = 1/2 and prove that 


2 
Eln] ~ yon, as n —> œ. (*) 


Hint. One can verify directly the following discrete version of Tanaka’s formula 
(see Problem 7.9.8): for any n > 1, one has 


DA = X sign(Sr-1) A Sk + Nn, (**) 
k=1 


1.9 Random Walk I: Probability of Ruin and Time Until Ruin in Coin Tossing 47 
where So = 0,8, = & +... + &, AS, = &, 


il. x>0, 
signx = 40, x=0, 


=k; x<0O, 


and N, = #{0 < k <n—1: Sp = 0} is the number of integersk,0<k <n—-1, 
for which S = 0. Then prove that 


n=l n—1 


E|Sp| = EN, = E X (Se = 0) = X P{S; = 0} (+x) 
k=0 


k=0 
and use the fact that P{S2% = 0} = meas A and that P{S; = 0} = 0 for odd k. 


Remark. One can conclude from (*xx*) that 


2 
EN, ~,/—n, asn> oœ. 
T 


(See [P §7.9, Example 2]—in formula (15) in that example 27 must be changed to 
2/1.) 


Problem 1.9.4. Two players are tossing symmetric coins (each player tosses his 
own coin). Prove that the probability that both players will have the same numbers 
of heads after n tosses is given by 27?” Te (c* )? and conclude that the following 
identity must hold: yc" )? = C}, (see Problem 1.2.2). 

Let on be the first instant when the number of heads obtained by the two players 
in a total of n trials coincide, with the understanding that op = n + 1, if coincidence 
does not occur in the first n trials. Calculate the probability P{o, = k}, 1 < k < 
n + 1, and the expected value E min(o,,, n). 

Hint. Let p = 1 (or —1), if player k, k = 1,2, obtains a head (or a tail) in the 
i™ trial. Then 


the numbers of heads obtained by ( _ ” (1) E 2) 
P| the players after n trials coincide ! ~ P| L = > Ši ! 
i=l i=l 


n 


= Pl re? = 25m og? =j- = ch? 
j=0 = 


i=l i=l 


and 
n n 2n 


Py yor = pea = P| Xni — ol = fees ace , 


i=1 i=1 i=] 
— ¢() — ¢@) 
=), N4 = 8 4+ - 


where 7; = ae m = 9, N3 N4 
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Problem 1.9.5. Suppose that £,...,& are independent Bernoulli random vari- 
ables with P{é; = 1} = P{gé; = —1} = 1/2andlet S, = &+...+&,1<n<N. 
Compute 
P( U S= o), 
Ni, <n<N2 

i.e., compute the probability that at some moment n € (Ni + 1,...,N2), No < N, 
one has S,, = 0. 

Problem 1.9.6. Suppose that €|,...,&, are independent Bernoulli random vari- 


ables with P{gé; = 1} = P{g& = —1} = 1/2,1 < i < N. Set S, =&+...+&, 
and consider the discrete telegraph signal X, = &(—1)*", 1 < n < N. Find the 
values and the variance of the random variables X,, 1 < n < N. Find also the 
conditional distribution P{X, = 1|& = i}, i = +1,1 <n <N. 


Problem 1.9.7. Let &,...,&, be independent Bernoully random variables, with 
P{& = 1} = p and P{gé = —1} = 1 — p, and let S; = & +... +&,1 <i <N, 
So = 0. Let Zy be the span (or the breadth) of—i.e., the total number of locations 
visited by—the random walk {So, S1,..., SN}. 

Calculate E#y and DÆy. Explain for what values of p one can claim that the 
variables #y satisfy the following version of the law of large numbers 


PI -e| > «| 0, Noo, 


where £ > 0 and c is some constant. (See Problem 2.6.87 and Problem 8.8.16.) 


Problem 1.9.8. Let &,...,&, be identically distributed random variables (not 
necessarily of Bernoulli type) and set Sọ = 0, S; = & +...+§,1<i1<N. 
Let 


N=) (Sk > 0) 


k=1 
be the total number of positive elements of the sequence So, S),..., Sn. Prove the 
Sparre-Andersen identity: 


PIN, =k} = P{N, = kYP{Ny~ =O}, O<k <n. 


Problem 1.9.9. Let &,...,€y be the Bernoulli random variables from 
Problem 1.9.7 and define the variables X1,..., Xy by 


X= ķi, Xn =AXn-1 + En, 2<n<N,AER. 


Calculate EX,,, DX, and cov(X,, Xn+4)- 
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1.10 Random Walk II: The Reflection Principle 
and the Arcsine Law 


Problem 1.10.1. Define o2, = min{1 < k < 2n : Sų = 0}, with the understanding 
that 02, = 00 (or Oo, = 2n), if Sk Æ 0 for all 1 < k < 2n. What is the rate of 
convergence in E min(o>,,, 2n) —> co as n > œ? 

Hint. Note that according to [ P §1.10, 1] one must have 


n 


E min(o2,, 2n) = > urxk—-1) H 2N Un, 
k=1 


where uzn ~ 1/./7n, and conclude that 


E min(o,,,2n) ~ ie ae n>o. 
T 


Problem 1.10.2. Let t, = min{1 < k < n: Sk = 1}, with the understanding that 
Tn = © if S < 1, forall 1 < k < n. What is the limit of E min(t,,) as n —> co 
in the case of a symmetric (p = q = 1/2) and non-symmetric (p # q) Bernoulli 
walk? 

Hint. The answer here is this: 


(p—q)", p>q, 
oo, p<q. 


E min(t,, n) > 


Problem 1.10.3. Based on the concepts and the methods developed in [ P §1.10], 
prove that the symmetric (p = q = 1/2) Bernoulli random walk {S;,k < n}, given 
by So = 0 and S = & +... + &,k > 1, satisfies the following relations (N is 
any positive integer): 


P| max Sk > N, S, < N} = = P{S, > N}, 


l<k<n 
P{ max Si > n\ = = 2P{S, > N}—P{S, = N}, 


ales 


max Sk = N \ = P(S, =N}+P(S,=N+1 = 2ml 


max Sk < ol = = P{S, = = 0} + P{S, = == gl? l. 


1<k<n 


mar Sk <0, Sp > o} = = P{S, Æ 0,...,Sn # 0, Sn+1 = 0), 


k<n-1 
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1 
P{S; > 0,..., Son—1 > 0, So, = 0} = = Jn OR s 


1 
P{Si = 0,..., Son— > 0, San = 0} = — 2” C} ‘ 
{ il j=] Z 2 } a i a 
In addition, prove that the relations 


P{So, = 2k} =2-™"c2* , k=0,1,..., £n, 
and 


P{Song1 = 2k +1} = 2°"! | k= -—(n + 1),0,+1,..., n, 


can be re-written in the form: 


2ce, ifk=n (mod 2), 


in all other cases , 


P{S, =k} = 


where k = 0, +41,..., +n. 
In addition to the above formula for P{max)<;<, Sk = N} for positive integers 
N, prove that 


P| max S = r} = c e72] a 


0<k<n 
forr = 0,1,..., n. 


Problem 1.10.4. Let €,...,&, be independent Bernoulli random variables with 
Pf& = 1} = P{& = —1} = 1/2, k < 2n. Let So = 0 and S = & +...+ &, for 
k > 1, and, finally, let 


Zon = max{0 < 2k < 2n : Sox = 0} 


be the moment of the last zero in the sequence ($2, S4,...,S2,), where we set 
Z2n = 0 if no such moment exists. 

Prove that 

P1227 = 2k} = UnU2(n—k); 1 < k <n, 

where uxx = P{Sx, = 0} = amas OL 

By comparing the distribution of g2, with the probability P2k 2n of the event that 
on the interval [0, 27] the random walk spends 2k units of time in the positive axis 
(see formula [ P §1.10, (12)]), one finds that, just as in formula [ P §1.10, (15)], the 
following property holds for 0 < x < 1: 


2 
5 P{8g2n = 2k} = arcsin /x , as n —> œ; 
ua 


{k:0<£ <x} 


n= 


i.e., the probability distribution of the last zero satisfies the asymptotic arcsine law. 
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Problem 1.10.5. In the context of the previous problem, let 62, denote the moment 
of the first maximum in the sequence So, S1,..., Sn, 1€., Oon = k, if So < 
Sk... Seep < Sk, while Ski < Sk,- Son < Sk, and Oon = 0 if no such 
k > 1 exists. Prove that 


1 
P{O2, = 0} = Un, P{6>, = 2n} = 2 Un 5 
and that, for0 < k <n, 
1 
P102, = 2k or 2k + 1} — 7 Uk U2n—2k - 


Then conclude that, just as in the previous problem, the law of the moment of the 
first maximum satisfies the arcsine law: given any 0 < x < 1, one has 


2 
5 P{6., = 2k or 2k + 1} > — arcsin Vx, n —> oœ. 
x 


{k:0<£ <x} 


Consider also the case x = 0 and x = 1. 


Problem 1.10.6. Let Sų = E&i +...+&,k < 2n, where &,...,&,, are independent 
and identically distributed random variables with P{é, = 1} = P{& = —1} = 1/2. 
Prove that: 

(a) For r = +1,...,-n, one has 


Ir] 


PES) Æ 0,..., So Æ 0, Son = 2r} = CH — 2, 


n 
(b) For r = 0,+1,..., +n, one has 
Pits = 2r} = CT 2”, 


Problem 1.10.7. Let {S;,k < n}, given by Sọ = 0 and Sk = & + ... + &, for 
k > 1, be a symmetric Bernoulli random walk (with independent and identically 
distributed £, ... , En, with P{€, = 1} = P{é; = —1} = 1/2). Setting 


M, = max S, mMm, = min Sk, 
0<k<n O<k<n 


prove that 


law law 


(Mn = Sn, Sh —My, Sn) = (—mn, Mn, Sn) = (Mn, —=Mn, Sn) , 


c law ,, 


where “ = ” means that the respective triplets share the same joint distribution. 
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Problem 1.10.8. Let So = 0 and Sk = & +... + &,k > 1, where &,&,... 
are independent random variables with P{& = 1} = p and P{& = —1} = 4q, 
p+q = 1. Prove that 


P| max Sk > N, Sh = m} = Cre. 


l<k<n 


where u = N + (n—m)/2 and v = (n+m)/2, and conclude that, for p = q = 1/2 
and m < N, one has 


P| max S; = N, Sn =m} = P{S, = 2N —m}—P{S, = 2N -m +2}. 


l<k<n 


Problem 1.10.9. Let &,&,... be any infinite sequence of independent Bernoulli 
random variables with P{g = +1} = P{é& = —l} = 1/2. Define Sọ = 0, 
Sa = & +---+&,n = 1, and, given any x € Z = {0, +1, +2,...}, consider the 
moment (of the first visit of x after time zero): 


olx) = inf{n > 0: S, = x}, 


with the understanding that o; (x) = oo, if {-} = Ø. 


Prove that, for x = 1,2,..., one has 

—2n—-1 
Ploi) > n} = P{ max Sk < x}, P{oi(1) = 2n + 1} = Ch. 
Pfoi(x) =n} = a Cot Pfoi(1) >n} = 27 CMA, 


Remark. With regard to the question of existence of an infinite sequences of 
independent random variables £1, &,..., see [P §1.5, 1]. 


Problem 1.10.10. Let everything be as in the previous problem. In addition to the 
moments g; (x), define the moments 


ok(x) = inf{n > op- (x) : Sn =x}, k=2,3,..., 


with the understanding that o,(x) = oo if {-} = @. (The meaning of these 
moments should be clear: o (x) is the moment of the k" visit to x.) 


Prove that, for n = 1,2,..., one has 
Pio (0) = 2n} aoe" ln ICAL, P{o (0) < œ} = 1, 
P{o (0) > 2n} = 27C} = P{S2, = 0}, Eo; (0) = oo. 


Show also that o1 (0), 02(0) — o1 (0), 03(0) — o2 (0), . . . is a squence of independent 
and identically distributed random variables. (This property is the basis for the 
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method of “regenerating cycles,’ which is crucial in the study of random walk 
sequences—see Sect. A.7 in the Appendix for details.) 


Problem 1.10.11. Let &, &,... be any infinite sequence of independent Bernoulli 
random variables and let So = 0 and S, = E1 +---+&,n > 1. Define 


Lyi(x) = #{k,0<k <n: Sk = x} 


and notice that L,,(x) is nothing but the total number of moments 0 < k < n, at 
which the random walk (S;)o<x<n happens to be in state x, x = 0,41,+2,...— 
comp. this definition with the related quantity N,, (x), introduced in Problem 7.9.8; 
see also Problem 1.9.3. The quantities L,,(x) and N,,(x) are commonly referred to 
as (discrete) local times in state x on the time interval {k : 0 < k <n}. 

Prove that, for k = 0, 1,...,n, one has: 


P{Lon(0) = k} = P{Lon41(0) = k} = 2-7" HECA p, 


P{L, (0) = k} = a hy 


]—k ’ 
k-1 
P{Lan (0) < k} = Pop @) > 2n} = 2” > Ch 
j=0 


P{L, (x) = 0} = P{o (x) > n} = 5 2i cur l 
j=n+1 


and, for x = +1, +2, ..., one has: 


2|x|— 1 
2|x| 
(The quantities o; (x) are defined in Problem 1.10.10.) 


P{La o (x) = 0} = » ELoo (x) = 1. 


Problem 1.10.12. In the context of the previous problem, set 


u(n) = min fk, O0<k<n: Sp = an S S; it 


and prove that 


lk/2] om lk/2] y-n k= 
C eC ae A a 


P{u(2n) = k} = 
cr g2, k=0. 
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1.11 Martingales: Some Applications to the Random Walk 


Problem 1.11.1. Let 2% 3 Y 3 --- < 2, be any nondecreasing sequence of 
partitions of 2, such that F = {Q}, and let ng, 1 < k < n, be some random 
variables on §2, chosen so that each nx is Y.-measurable. Prove that the sequence 
E = (Ek, A)i<k<n, given by 

k 


& = Doin — EC | D1. 


l=1 


is a martingale. 
Hint. Prove that E(Ek+1 — & | A) = 0. 


Problem 1.11.2. Suppose that the random variables n,,..., Nn are chosen so that 
En, = 0 and E(n|m,...,e-1) = 0,1 < k < n. Prove that the sequence 
E = (&)i<k<n, given by & = mı and 


k 
feat = ¥ Aiea k<n, 


i=l 
for some choice of the functions f;(71,...,7;), represents a martingale. 


Problem 1.11.3. Prove that any martingale E£ = (ék, 2k)ı<k<n has independent 
increments: ifa <b < c < d, then 


cov(Ea = Ee, Ep = Eq) = 0. 


(Recall that in the present chapter all random variables are assumed to take only 
finitely many values.) 


Problem 1.11.4. Let € = (&,...,&,) be any random sequence in which each éx 
is Y.-measurable (21 3 F 3 ... < Y,). Prove that in order for this sequence to 
be a martingale (relative to the partitions (Y%)), it is necessary and sufficient that, 
for any stopping time t (relative to (2%)), one has EE, = E&,. (The phrase “for 
any stopping time” may be replaced by “for any stopping time that takes only two 
values”’.) 

Hint. Let Eé, = E&,, for any stopping time t that takes only two values. For a 
fixed k € {1,...,n — 1} and A € , consider the moment 


rw) a Sh CREO) A, 
k+1, if&(w)€A. 


After showing that E&, = E& Iz + E&x I4, conclude that E& 41/4 = E& I4. 
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Problem 1.11.5. Prove that if& = (&, 2k)ı<k<n is a martingale and t is a stopping 
time, then for any k < n one has 


Ele Tao] = Elf lea]. 


Problem 1.11.6. Let E = (ék, Zk)i<k<n and N) = (Nk, Zk)i<k<n be any two 
martingales with £; = 7; = 0. Prove that 


Eén = > Elk — &—1) (me — Me-1) 


k=2 
and that, in particular, 
Es =) EG — &-1)”. 
k=2 


Problem 1.11.7. Let 71,...,7, be any sequence of independent and identically 
distributed random variables with En; = 0. Prove that the sequence E = (&)i<ken, 
given by 


k 2 
= a 2 _ exp{A(m +... + ne) 
E (x) kEm, oby k= Eopan ` 


i=l 
represents a martingale. 


Problem 1.11.8. Let 71,...,7, be any sequence of independent and identically 
distributed random variables that take values only in the (finite) set Y. Let fo(y) = 
P{n, = y} > 0, y € Y, and let f{(y) be any non-negative function with 
Xey Si(y) = 1. Prove that the sequence £ = (&, F))i<k<n, with 


film)... fik) 
SS DQ: = F, eee k? 
Jom)... fok) í Ga 


forms a martingale. (The variables & are known as likelihood ratios and play a 
fundamental role in statistics.) 


Ek 


Problem 1.11.9. We say that the sequence € = (Ek, 2k )o<k<n is a supermartingale 
(submartingale) if P-a.s. one has 


E41 |A)<& (© &),O<k <n. 


Prove that every supermartingale (submartingale) can be represented (and in a 
unique way) in the form 


& =me—ae (+a), 
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where m = (mx, 2k)o<k<n iS a martingale anda = (ax, Zk)o<k<n iS a non- 
decreasing sequence such that dg = 0 and each a, is Y,_,-measurable. 


Problem 1.11.10. Let E€ = (Ek, 2k)o<k<n and n = (Nk, Zk)o<k<n be any two 
supermartingales and let t be any stopping time, relative to the partition (2%)o<k<n, 
chosen so that P{é, > n+} = 1. Prove that any sequence € = (¢k, Zk)o<k<n that 
switches from 7 to € at the random moment q, i.e., any ¢ given either by 


Ek = &kI (t > k) +m <k), 


or by 
Ck = & I(t >k) + nl (t <k), 


is also a supermartingale. 


Problem 1.11.11. Let é = (Ek, 2k)o<k<n be any submartingale of the form 


& = > Ihn: 


m<k 
where Am E€ Y,,. Find the Doob decomposition for this submartingale. 
Problem 1.11.12. Let E = (Ek, 2k)i<k<n be any submartingale. Verify the follow- 
ing “maximal” inequality: 


E max £; < = [1 + E(f Int E*)], 


l<n 


where In* x = max(In x, 0). 


1.12 Markov Chains: The Ergodic Theorem: The Strong 
Markov Property 


Problem 1.12.1. Let € = (éo, &,...,&,) be a Markov chain with values in the 
space X and let f = f(x) (x € X) be some function on X. Does the sequence 
(S (Eo), f(E1),.-., f(En)) represent a Markov chain? Does the “reverse” sequence 
(&),€:-1,-.-, E0) represent a Markov chain? 


Problem 1.12.2. Let P = ||p;;||, 1 < i, j < r be any stochastic matrix and let À 
be any eigenvalue of that matrix, i.e., A is a solution to the characteristic equation 
det[P — AJ] = 0. Prove that A; = 1 is always an eigenvalue and that the absolute 
values of all remaining eigenvalues A>,...,A, cannot exceed 1. Furthermore, if 
there is a number n, such that P” > 0 (in the sense that po > 0), then |A;| < 1, 
i = 2,...,r. Show also that if all eigenvalues 4,,...,A, are different, then the 
transition probabilities po can be written as 


pË = xj + ay A$ +... + ay (PAF, 
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where the quantities 7j, dj; (2),...,4; j (r) can be expressed in terms of the entries 
of the matrix P. (In particular, as a result of this algebraic approach to the study of 
the asymptotic properties of Markov chains, we find that, if |A2| <1,...,|A,| < 1, 
then the limit Hir pP exists for any j and, in fact, does not depend on i.) 


Problem 1.12.3. Let € = (&, £1, ... , En) be any homogeneous Markov chain with 
(finite) state space X and with transition probability matrix P = || pxy ||. Denote by 


Tow) = El) lf = x] (=E e0)pw) 


the associated one-step transition operator and suppose that the function g = g(x) 
satisfies the equation 
Tox) =x), xEX, 


i.e., happens to be “harmonic.” Prove that for any such choice of the function ø, the 
sequence 


£=(%&,Ayo<e<n, with tk = p(k), 


Problem 1.12.4. Let € = (n, II, P) and E = En TI, P) be any two Markov 
chains that share the same transition matrix P = || pall, 1 < i,j < r, but have 
two different initial distributions, resp. T = (p1,..., pr) and Ti = (P1,---, Pr). 
Letting 7 = (p,..., pi) and T7™ = (p\”,..., pi”) denote the respective 
n-steps distributions, prove that if min; ; pi; = € > 0, then 


Dole” =p | s 20 —re)". 


i=l 
Hint. Use induction in n. 


Problem 1.12.5. Let P and Q be any two stochastic matrices. Prove that PQ and 
aP + (1 —a@)Q, for any choice of 0 < a < 1, are also stochastic matrices. 


Problem 1.12.6. Consider any homogeneous Markov chain (£o, &1,...,&,) with 
state-space X = {0, 1} and with transition probability matrix of the form 


( l-a a ) 

p 1-8)’ 
for some 0 < œ < landO < f < 1, and set S, = & + . . . + En. As a generalization 
of the Moivre—Laplace theorem (see [ P §1.6]), prove that 


Sn — =n 
+ 
ath <y > (x), as n>. 


naB(2—a—B) 
Vo (atp)s 


58 1 Elementary Probability Theory 


Argue that if œ + 6 = 1 one can claim that the variables &,..., & are independent 
and that the last relation comes down to 


S, —an 
| Sat s: > P(x), anoow. 


Jnap ~ 


Problem 1.12.7. Let &,&1,...,&y be any Bernoulli sequence of independent 
random variables with P{&; = 1} = P{&; = —1} = 1/2. Consider the variables 
No, N1,- -, Ny, defined by no = & and nn = ay l<n<QJN. 

(a) Is the sequence no, n1,- .., ny Markovian? 

(b) Is the sequence bo, Gis aay tn, given by bo = Eo, and bn = En—1En» l<ns 
N, Markovian? 


Problem 1.12.8. Let X,,..., X, be any collection of independent and identically 
distributed random variables. With any such collection one can associate what 


is known as the order statistics and is defined as the sequence X ee eee n. 
obtained by arranging the values X,,...,X, in non-decreasing order. (So that 
x” = min(X),...,Xn)ys 0-5 x = max(X\,..., Xn), with the understanding 
that when X; = ... = X; = min(X1,...,X,) andi; <... < ig, then Pai is the 


variable X;,. Similar convention is made in all analogous cases (see Problem 2.8.19, 
for example). 

Note that, in general, the elements of the order statistics X s., XP (known 
as rank statistics) will not be independent even if the variables X4, ..., Xn are. 

Prove that when each variable X; takes only two values, the rank statistics form a 
Markov chain. Prove by way of example that, in general, this claim cannot be made 
if each X; takes three values. (Note that if each X; is continuously distributed (see 
[P §2.3]), then the rank statistics always form a Markov chain.) 


Chapter 2 
Mathematical Foundations 
of Probability Theory 


2.1 Probabilistic Models of Experiments with Infinitely 
Many Outcomes: Kolmogorov’s Axioms 


Problem 2.1.1. Let Q = {r:r € [0, 1] O Q} denote the set of all rational numbers 
inside the interval [0, 1], let </ be the algebra of sets that can be expressed as finite 
unions of non-intersecting sets A of the form {r:a < r < b}, {r:a < r < b}, 
{ria <r < b}, or {r:a < r < b}, and let P(A) = b — a. Prove that the set- 
function P(A), A € æ, is finitely additive but not countably additive. 


Problem 2.1.2. Let 2 be any countable set and let F denote the collection of all 
subsets of 92. Set w(A) = 0 if A is finite and u(A) = œ if A is infinite. Prove that 
the set-function u is finitely additive but not countably additive. 


Problem 2.1.3. Let u be any countably additive measure on (S2, -¥). Prove that 
(a) If A, + A, then u(An) î (A); 
(b) If A, | A and u(Ak) < co for some k, then w(A,) | (A); 
(c) If u is finite (u(2) < co) and A = lim A,, i.e., A = lim A, = lim A,,, then 
H(A) = lim p(A,). 
(This problem continues in Problem 2.1.15.) 
Hint. Use the relations 


co [o.e) Co [0.0] 
lim An = |_J() Ax and lim A, =() (4. 


n=lk=n n=lk=n 


Problem 2.1.4. Verify the following properties of the symmetric difference be- 
tween sets: 


(AAB)AC = AA(BAC), (AAB)A(BAC) = AAC, 
AAB=C 4> A=BAC. 
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Problem 2.1.5. Prove that the “metrics” pı (A4, B) and p2(A, B), defined by 
p\(A, B) = P(A A B), 


P(AAB) : 
p2(A, B) = } PAUB)’ if P(A U B) £ 0, 


0, if P(A U B) = 0, 


where A A B is the symmetric difference of A and B, satisfy the “triangular 
inequality.” 
Hint. Use the relation AA C C (AA B)U (BAC). 


Problem 2.1.6. Let jz be any finitely additive measure on some algebra 7. Show 
that if the sets A1, A2,--- € & are non-intersecting and, in addition, one has A = 
> 2, Ai € &, then one can claim that (A) > 772, wu (Ai). 


Problem 2.1.7. Prove that 


lim sup A, = liminf An, liminf A, = lim sup An 
liminf A, C lim sup An, limsup(A, U Bn) = lim sup A, U lim sup Bn, 


liminf(A, N Bn) = liminf A, N liminf Bn, 


lim sup A, N liminf B, C lim sup(An N Bn) C lim sup A, N lim sup By. 


Prove also that if A, | A, or if A, | A, then 


liminf A, = lim sup An. 


Problem 2.1.8. Let (x„) be any sequence of real numbers and let A, = (—o0, Xn). 
Prove that for x = limsupx, and A = limsup A, one has (—oo,x) CA C 
(—oo, x]. (In other words, A must be either (—oo, x) or (—oo, x].) 


Problem 2.1.9. Lt A1, Az,... be any sequence of subsets of the set 2. Prove that 
lim sup (A, \ An4+1) = lim sup (An+1 \ Án) = (lim sup 4n) \ (lim inf A,). 


Problem 2.1.10. Give an example showing that, in general, a measure that can take 
the value +00 could be countably additive, but still not continuous at “the zero” Ø. 


Problem 2.1.11. We say that the events {A; € ¥:1 < i < n} are exchangeable 
(or interchangeable) if all probabilities P(A;, ... A;,) are identical (= p;) for every 
choice of the indices 1 < i; <--- < i; < n, and this property holds separately for 
every | < / < n. Prove that for such events the following “inclusion—exclusion” 
formula is in place (see Problem 1.1.12): 


°(U a) =p -C P2 + C e = era (-1)""" pn. 


i=l 
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Problem 2.1.12. Let (A;),>, be any infinite sequence of exchangeable events, i.e., 
for every l > 1 one can claim that the probability P(A;, ... A4) (= pı) does not 
depend on choice of the indices 1 < i; < +-+- < i;. Prove that in any such situation 


one has 
[0.6] 
Pim) = P( f Ae) = i pr 
k=1 
[0.6] 
P(lim A, ) =P Ay | = 1— lim (-1)! A! 
(lim 4,) (U ) Jim (—1)'A' (po), 
where po = 1, A! (pa) = Pn+1 — Pn, A! (pn) = A! (A! (pn)), l = 2. 


Problem 2.1.13. Let (A;)n>1 be any sequence of sets and let /(A,), n > 1, be the 
associated sequence of indicator functions. Prove that 


(a) I (lim 4,) = lim I(4,), 1 (Tim 4,,) = Tim I(A,), 


©) Tm (An) — lim 1(A,) = 1 (Tim A, \ lim A, ), 


(c) (Ù a) < J In). 


n=1 n=l 


Problem 2.1.14. Prove that 


oo oo 
(U a) = eo I(A,), (A a) = PAT I(An). 


n=1 n=1 


Problem 2.1.15. (Continuation of Problem 2.1.3.) Let jz be any countably additive 
measure on (2, F). Prove that 

(a) (lim A,n) < lim u (An). 

(b) If, in addition, the measure jz happens to be finite (u (42) < oo), then 


(lim A,n) > Tim u (4n). 
(c) In the special case of probability measures P, one has 
P(lim An) < lim P(4,) < lim P(4,) < P(lim 4,) 


(“Fatou’s lemma for sets”). 


Deduce from the above relations the following generalization of the “continuity” 
properties (2) and (3) oft the probability P, mentioned in the Theorem of [ P §2.1, 2]: 
if A = lim, A, (i.e., lim A, = lim A, = A), then P(A) = lim, P(A,). 
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Problem 2.1.16. Let A* = limA, and let A, = limA,. Prove that 
P(A, — Ax) —> 0 and P(A* — An) > 0. 


Problem 2.1.17. Suppose that A, —> A (in the sense that A = A* = Ax; see 
Problem 2.1.16). Prove that P(AA A,,) — 0. 


Problem 2.1.18. Suppose that the sets A, converge to the set A, in the sense 
that P(AA lim A,) = P(AA lim A,,) = 0. Prove that in that case one must have 
P(AAA,) > 0. 


Problem 2.1.19. Let Ao, Ai,... and Bo, Bi,... be any two sequences of subsets 
of 2. Verify the following properties of the symmetric difference: 


Ao A Bo = Ao A Bo, 


Ao A (U Bun) E Uo A Bn), 


n21 n>1 


Ao A (N Bn) 2 (404 Bn), 


n=l n>1 


(U4n) 4 (UB) S (An A Br), 


n>1 n>1 n>1 


(1 4n) 4 (7) Ba) S Un A Bn). 


n>1 n>1 n>1 
Problem 2.1.20. Let A, B, C be any three random events. Prove that 
|[P(AN B) —P(ANC)| < P(B AC). 


Problem 2.1.21. Prove that for any three events, A, B and C, the probability that 
exactly one of these events will occur can be expressed as P(A) + P(B) + P(C) — 
2(P(AB) + P(AC) + P(CD)) + 3P(ABC). (Comp. with Problem 1.1.13.) 


Problem 2.1.22. Let (A,,),>1 be any sequence of events in F, for which 
XU P(An A Angi) < 00. 


Prove that 
P{(limA,,) A (lim A,)} = 0. 


Problem 2.1.23. Prove that, for any two events A and B, one has 


max(P(A), P(B)) < P(A U B) < 2max(P(A), P(B)) 
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and 
P(A U B)P(AN B) < P(A)P(B). 


When can one claim that the last relation is actually an identity? _ 
Show also the Boole inequality: P(A N B) > 1 — P(A) — P(B). 


Problem 2.1.24. Let (An)n>1 and (B,)n>1 be any two sequences of events chosen 
so that A, C B, for every n > 1. Prove that {A, i. o. } C {B, i. o. } (“i.o.” stands 
for “infinitely often,’ meaning that infinitely many events in the associated sequence 
occur). 


Problem 2.1.25. Suppose again that (A,)n>1 and (B,)n>1 are two sequences of 
events such that 


P{A,i.o.}=1 and P{B,i.0.} =0. 
Prove that P{A,, O B, i. 0. } = 1. 


Problem 2.1.26. Give an example of two finite measures, yı and u2, defined on 
the same sample space 2 (i.e., two measures with uı (92) < co and u2(2) < oo), 
for which the smallest measure, v, with the property v > 4; and v > m is not, as 
one might think, max(jl;, 42), but is actually pı + u2. 


Problem 2.1.27. Suppose that the measure space (92,.#) is endowed with a 
sequence of probability measures, P;,P2,..., and suppose that P(A), A € F, is 
some set-function on .¥, for which the following relation holds for every A € F: 


P,(A) > P(A). 


Prove the following properties, known as the Vitali-Hahn—Saks theorem: 

(a) The set-function P = P(-) is a probability measure on (2, F). 

(b) For any sequence A, A2,... € F with Ag | Ø ask — ov, one must have 
sup, Pa (Ax) | Oas k > oo. 


Problem 2.1.28. Consider the measure space (R, A(R)) and give an example of a 
sequence of measures Un = [l,(A), A € A(R), n > 1, such that for every A € 
A(R) the sequence (j1,(A))n>1 is decreasing, but the limit v(A) = lim, n(A), 
A € &(k), does not represent a finitely-additive set function and, therefore, cannot 
be treated as a measure. 


Problem 2.1.29. Let (2,.%,P) be any probability space and let (A,),>1 be any 
sequence of events inside F. Suppose that P(A,) > c > 0,n > 1, and let A = 
lim A,. Prove that P(A) > c. 


Problem 2.1.30. (The Huygens problem.) Two players, A and B, take turns tossing 
two fair dice. Player A wins if he gets a six before player B gets a seven (otherwise 
player A looses and player B wins). Assuming that player A tosses first, what is the 
probability that player A wins? 

Hint. One must calculate the probability Py = ere P(A;.), where A, is the 
event that player A wins after the (k + 1)* turn. (The answer is: P4 = 30/61.) 
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Problem 2.1.31. Suppose that the set 2 is at most countable and let .F be any 
o-algebra of subsets of 92. Prove that it is always possible to find a partition 2 = 
{D,, Do,...}(Qjen Di = 2, DND; = @,i Aj, N = {1,2,...}) that generates 
F,ie., 
F=] DME NÌ. 
ieM 

(Comp. with the analogous statement for the case of a finite set §2 formulated in 
[P §1.1, 3]) 

Hint. Consider constructing Y from the equivalence classes in the set (2 
associated with the relation 


w ~a: $ (w E A & œ E A for every A E€ F). 


2.2 Algebras and o-algebras: Measurable Spaces 


Problem 2.2.1. Let 4, and 2, be any two o-algebras of subsets of the space 2. 
Can one claim that the following collections of sets form o-algebras 


BiINB,={AAE ZF, and A E€ #>}, 
Bi Bı ={A AEF, or A E By}? 


Let 21 V A2 be the smallest o-algebra, o (21, 22), that contains 4, and Ap. 
Prove that 4, V A> coincides with the smallest o-algebra that contains all sets of 
the form Bı N Bo, for all choices of By € A, and By E€ Ad. 

Hint. Convince yourself that 4, N 4, is a o-algebra and prove by way of 
example that, in general, 4, U Z is not a o-algebra. (Such an example can be 
constructed with a set §2 that has only three elements.) 


Problem 2.2.2. Let 2 = {D,, D2,...} be any countable partition of the set 2 and 
let Z = o(Y). What is the cardinality of the o-algebra 2? 

Hint. With any sequence x = (x1, X2,...), that consists of 0’s and 1’s, one can 
associate the set D* = D{'UD;°U..., where DY = @, if x; = 0, and D;' = Dj, 
if x;= 1, 


Problem 2.2.3. Prove that 
BR") & BR) = B(R"*!). 


Problem 2.2.4. Prove that the sets (b)—(f) from [ P §2.2, 4] belong to @(R®). 
Hint. In order to show, for example, (b), notice that 


{x cima <a} =(\U (\fxim<at zt. 


k=1 m=1 n=m 
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— 1 
In other words, lim, x,» <a & Yke N AMEN: Vn >m, xX» <at T One 
can derive (c)-(f) with a similar argument. 
Problem 2.2.5. Prove that the sets Az and A3 from [P §2.2, 5] do not belong to 
BRO), 
Hint. As is the case with the set A,, the desired property of Az and A3 can be 
established by contradiction. 


Problem 2.2.6. Verify that the function in [P §2.2, (18)] is indeed a metric. 
Problem 2.2.7. Prove that A(R”) = A(R”), n > 1, and A (RY) = A(R). 


Problem 2.2.8. Let C = C[0, 00) be the space of continuous functions x = (x+), 
defined for t > 0. Prove that, relative to the metric 


foe) 


p(x, y) = Xom min[ Sup lx — yd, 1], x,y €C, 
<t<n 


n=l 


this space is a Polish space (just as C = C[0, 1]), i.e., a complete, separable metric 
space, and the o-algebra 2(C), generated by all open sets, coincides with the o- 
algebra 2(C), generated by all cylinder sets. 


Problem 2.2.9. Show the equivalence between the group of conditions {(Aq), (Ap), 
(A.)} and the group of conditions {(A,), (Aj,), (A/)} in [P §2.2, Definition 2]). 


Problem 2.2.10. Prove [P §2.2, Theorem 2] by using the statement in [P §2.2, 
Theorem 1]. 


Problem 2.2.11. In the context of [ P §2.2, Theorem 3], prove that the system 2 
is a A-system. 


Problem 2.2.12. A o-algebra is said to be countably generated, or separable, if it 
is generated by some countable collection of sets. 

Prove that the o-algebra Z, comprised of all Borel sets inside 2 = (0, 1], is 
countably generated. 

Prove by way of example that it is possible to find two o-algebras, F, and Fo, 
such that F> is countably generated, one has F; C Fz, and yet F; is not countably 
generated. 


Problem 2.2.13. Prove that, in order for the o -algebra Z to be countably generated, 
it is necessary and sufficient that Y = o(X), for some appropriate random variable 
X (see [P §2.2, 4] for the definition of o (X )). 


Problem 2.2.14. Prove that (X1, X2,...) are independent random variables 
(LP §2.2, 4] and [P §2.2, 5]) whenever o(X,,) and o(Xj,...,Xn—1) are 
independent for every n > 1. 


Problem 2.2.15. Let (2,.4%,P) be any complete (see [P §2.3, 1] and Problem 
2.2.34) probability space, let Y be any sub-o-algebra of F (Y C F) and let (6) )n>1 


66 2 Mathematical Foundations of Probability Theory 


be any non-increasing sequence of sub-o-algebras of F (6, D 6, D...;6, CF, 
n > 1). Suppose that all o-algebras under consideration are completed with all P- 
negligible sets from F. It may appear intuitive that, at least up to sets of measure 


zero, one must have 
(\oG.&) = o(9.()&). (x) 


n 


or, in a different notation, 


MEVEN N, (x) 


where ¥ V én = o(G,&,) is the smallest o-algebra generated by the sets from 
G and £, and the identity in (x) and (**) is understood as “identity up to sets of 
measure zero” between two complete o-algebras, say i C F and #4 C F, in 
the sense that, for every A € # one can find some B € .#%—and vice versa, for 
every B € #% one can find some A € .#—s0o that P(AAB) = 0. 

Nevertheless, the following example taken from [134] shows that, in general, the 
operations V (the supremum) and N (the intersection) between o-algebras cannot 
be interchanged. 

(a) Let &,&,&,... be any sequence of Bernoulli random variables with 
P{é; = 1} = P{é = —1} = 1/2, and let X, = &&... &), 


G =o(&,&,...), and é =0(Xz,k >n). 


Prove that 


NEIN) 

Hint. Prove that & is measurable with respect to (),, o (9, 6n) (= o (F , &)), but 
is still independent from the events in o(Y, (p En) (= 2). 

(This problem continues in Problem 7.4.25 below.) 

(b) The fact that for o-algebras the operations V and N do not commute follows 
from the (considerably simpler than (a)) claim (see [23]) that, if £; and & are any two 
of the random variables described in (a) (i.e., any two independent and symmetric 
Bernoulli random variables) and if £, = o(&), & = o(&) and Y = o(&,&), then 


(GVEXAGVA)FAGV(AN&) and (GNA)V(GNA)AGFN(EV ES). 
Prove this last statement. 


Problem 2.2.16. Let <4 and < be any two independent collections of sets, 
every one of which represents a 2-system. Prove that o(.%) and o(.%) are also 
independent. Give an example of two independent collections of sets, 24 and 4, 
neither of which is a 2-systems, and o (2%) and o(.%) are not independent. 
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Problem 2.2.17. Assuming that 7% is a A-system, prove that (A, B € Z, AN B= 
@) = (AUBe &). 


Problem 2.2.18. Let Fı and F, be any two o-algebras of subsets of the set 2 
and let 
U(F\,F72)=4 sup _ |P(4142) — P(4:)P(42)|. 
AEFI AEF 

Prove that the above quantity, which can be viewed as a measure of the dependence 
between -¥; and F>, has the following properties: 

a0 < dF, Fa) <1; 

(b) d(Fi, F2) = 0 if and only if F and F> are independent; 

(c) d(¥,, F2) = 1 if and only if the intersection Y; and ¥ contains an event 
that has probability equal to 1/2. 


Problem 2.2.19. Following the proof of [ P §2.2, Lemma 1], prove the existence 
and the uniqueness of the classes A(é) and 2(&), which contain the system of 
sets &. 


Problem 2.2.20. Let. be any algebra of sets that has the following property: for 
any sequence, (A,,),>1, of non-intersecting sets A, E€ /, one has Uzi An E £. 
Prove that æ is actually a o-algebra. 


Problem 2.2.21. Suppose that (Fa )n>1 is some increasing sequence of o-algebras, 
i.e., Fa C Fy41,n > 1. Prove that, generally, Ui Fa could only be claimed to 
be an algebra. 


Problem 2.2.22. Let Z be any algebra (resp., o-algebra) and let C be any set 
which does not belong to .F. Consider the smallest algebra (resp., o-algebra), which 
contains the family -F U {C }. Prove that this algebra (resp., o-algebra) is comprised 
of all sets of the form (A N C) U (BNC), for all choices of A, B € F. 


Problem 2.2.23. Let R = RU {—00}U {oo} be the extended real line. The Borel o- 
algebra A(R) may be defined (comp. with [P §2.2, 2]) as the o-algebra generated 
by the sets [—oo, x], x € R, where [—oo, x] = {—oo} U (—oo, x]. Prove that the 
o-algebra A(R) coincides with any of the o-algebras that generated, respectively, 
by any of the following families of sets: 

(a) [—00, x), x € R; 

(b) (x, co], x € R (where (x, oo] = (x, 00) U {00}); 

(c) All finite intervals and {—oo} and {oo}. 


Problem 2.2.24. Consider the measurable space (C, Ap(C)), in which C = C0, 1] 
is the space of all continuous functions x = (x;)o<;<1 and (C) is the Borel o- 
algebra for the metric p(x, y) = supe, < |X: — yr|. Prove that: 

(a) The space C is complete (relative to the metric p(-,-)). 

(b) The space C is separable (relative to the metric p(-, -)). 

Hint. Use the Bernstein polynomials—see [ P §1.5]. 
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(c) If treated as a subset of C [0, 1], the subspace 
C“[0, 1] = {x € C[0, 1] : x is differentiable}, 


is not a Borel set. 


Problem 2.2.25. Let < be any non-empty system of subsets of the sample 
space 2. Prove that a(.%), defined as the algebra generated by the system 2%, 
can be constructed as follows: Set cA = æ% U {@, 2} and define, for n > 1, 


yr, ={AUB:A, BE A}. 


Then. H C A CT... C GHC... and 


ah) = |] &. 


n=l 


Problem 2.2.26. For a given non-empty system of subsets of the sample space (2, 
denoted 2%, in Problem 2.2.25 we gave a method for constructing the smallest 
algebra, a(@%), that contains the system .™. Analogously, we now define the 
systems: 


A = Ay U {0,2}, 


ÜB: Bibao e l, 


n=1 


ty = Ay U {B : B € Hh}, 


H = A U 


Ay = A U 


[0.6] 
Ü B: Bisbes. e al, 


n=1 
y= ty U {B : B € A), 


co 


Vee 


n=1 


Ay = ALU 


It may seem intuitive that the system % = U2, &%, should give the smallest o- 
algebra, o(.%), that contains the system .%, however, in general, this claim cannot 
be made: one always has œ% © o(%), while, in general, Sœ 4 0(&H), i.e., the 
above procedure may not give the entire o-algebra o (Ao). Prove by way of example 
that o(.%) can be strictly larger than %p. 

Hint. Consider the case where % is the system of all intervals on the real line R. 
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Note that if one is to follow the above procedure, starting with > instead of ~, 
in general, one still cannot claim that o(.%) will be produced at the end. In order 
to produce o(.%), one must use transfinite induction (N; “times”). We refer to [47, 
vol. 1, p. 235, vol. 2, p. 1068] for further explanation of G. Cantor’s cardinality (or 
power) numbers and the related continuum hypothesis. 


Problem 2.2.27. (Suslin’s counterexample.) The construction and the conclusion 
given in the previous problem show that, in principle, a o-algebra may have a rather 
complicated structure. In 1916 M. Suslin produced a counterexample, which proved 
that the following statement, due to H. Lebesgue is not true in general: the projection 
of every Borel set B inside R? onto one of the coordinate axes is a Borel set 
inside R!. Just as M. Suslin did in 1916, construct a counterexample that disproves 
this statement. 

Hint. M. Suslin’s idea was to construct a concrete sequence, A1, A2,..., of open 
sets in the plain R? so that the projection of the intersection () A, on one of the 
coordinate axes is not a Borel set. 


Problem 2.2.28. (Sperner’s lemma.) Consider the set A = {1,...,} and let 
{A,..., Ag} be any family of subsets of A, chosen so that no member of this family 
is included in some other member of the same family. Prove that the total number 
K satisfies the estimate K < CP”. 


Problem 2.2.29. Let & be any system of subsets of 2 and let F = o(&) be the 
smallest o-algebra that includes the system & (i.e., the o-algebra generated by &). 
Suppose that A € F. By using the “suitable sets principle,” prove that one can 
always find a countable family, @ C &, for which one can claim that A € o (€). 


Problem 2.2.30. The Borel o-algebra & associated with the metric space (E, p) 
is defined as the o-algebra generated by all open sets sets inside F (relative to the 
metric p—see [P §3.1, 3 ]). Prove that, for certain metric spaces, the o-algebra ép, 
generated by all open balls, may be strictly smaller than & (69 C £). 


Problem 2.2.31. Prove that there is no o-algebra of cardinality Xo (“aleph-naught,” 
the cardinality of the set of natural numbers), that has countably infinitely many 
elements. Plainly, the structure of any o-algebra is always such that it has either 
finitely many elements (see Problem 1.1.10) or has uncountably many elements. For 
example, according to the next problem, as a set, the collection of all Borel subsets 
of R” has power ¢ = 280 (i.e., the power of the continuum), which is the same as 
the power of the collection of all subsets of the set of natural numbers. 


Problem 2.2.32. Just as in the previous problem, let c = 25° denote the power of 
the continuum. Prove that, as a set, the collection of all Borell subsets of R” has 
power c, while the o-algebra of all Lebesgue subsets has power 2°. 


Problem 2.2.33. Suppose that B is some Borel subset of the real line R and let A 
denote the Lebesgue measure on R. The density of the set B is defined as the limit 
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D(B) = lim MANEAT (*) 
T—> œ 2T 

provided that the limit exists. 

(a) Give an example of a set B for which the density D(B) does not exist, in that 
the limit (*) does not exist. 

(b) Prove that if Bı and Bz are two non-intersecting Borel subsets of the real line, 
then 

D(B, + B2) = D(B\) + D(B2), 

in the sense that if either side of the above identity exists then so does also the other 
side and the identity holds. 


(c) Construct a sequence, B1, B2, ... , of Borel sets inside R, every one of which 
admits density D(B;), but, nevertheless, countrary to the intuition, one has 


p(z) = 5 DB). 


i=l i=l 


Problem 2.2.34. (Completion of o-algebras.) Let (2, F,P) be any probability 
space. We say that this probability space is complete (or, equivalently, P-complete, 
or, complete relative to the measure P), if B € ¥ and P(B) = 0 implies that any 
set A with A C B must be an element of F. 

Let M denote the collection of all subsets N C §2 with the property that there 
is a set (possibly depending on N), By € F, with P(By) = 0 and N C By. Let 
F (sometimes written as ZP or F’) denote the collection of all sets of the form 
A U N, for some choice of A € F and N € M. Prove that: 

(a) F isa o-algebra; 

(b) If B C 2 and there are sets, A, and A», from F, with A; C B C A and 
P(A, \ A1) = 0, then B € F; 

(c) The probability space (2, F , P) is complete. 


2.3 Methods for Constructing Probability Measures 
on Measurable Spaces 


Problem 2.3.1. Suppose that P is a probability measure on (R, A(R)) and let 
F(x) = P(—oo, x], x € R. Prove that 


P(a,b] = F(b) — F(a), P(a,b) = F(b—)— F(a), 
P[a, b] = F(b) — F(a—), Pla,b) = F(b—)-— F(a-), 
P({x}) = F(x) — F(x-), 

where F'(x—) = limyy, F (y). 

Problem 2.3.2. Verify formula [P §2.3, (7)]. 
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Problem 2.3.3. Give a complete proof of [ P §2.3, Theorem 2]. 


Problem 2.3.4. Prove that a (cumulative) distribution function F = F(x), defined 
on the real line R can have at most countably many points of discontinuity. Does 
this statement have an analog for distribution functions defined on R”? 

Hint. Consider using the relation {x : F(x) #4 F(x—)} = Ufa : F(x) - 
F(x-) > +}. In general, for distribution functions defined in R”, one cannot claim 
that the points of discontinuity are at most countably many. To find a counter- 
example, consider the delta-measure 


1, if0EA, 


&(A) = 
0, if0gA, 


A € A(R"). 


Problem 2.3.5. Prove that each of the functions 


1, x+y2>0, 


G(x, y) = 
0, x+y <0, 


G(x, y) = |x + y] = the integer part of x + y, 


is right continuous and increasing in each variable but, nevertheless, cannot be 
treated as a (generalized) distribution function in R?. 


Problem 2.3.6. Let u denote the Lebesgue—Stieltjes measure associated with some 
generalized distribution function and let A be any at most countable set. Prove that 


(A) = 0. 


Problem 2.3.7. Prove that the Cantor set VY is uncountable, perfect (meaning, a 
closed set in which every point is an accumulation point, or, equivalently, a closed 
set without isolated points), nowhere dense (meaning, a closed set without interior 
points) and has vanishing Lebesgue measure. 


Problem 2.3.8. Let (2, .#,P) be any probability space and let æ% be any algebra 
of subsets of 2, such that o (2) = F. Prove that, for every € > 0 and for every set 
B € #, one can find a set A, € æ, with the property 


P(A-AB) < e. 


Hint. Consider the family Z = {B € F : Ye > 0 IJAE WM: P(AAB) < &} 
and prove that Z is a o-algebra, so that F 2 BDao(M) = F. 


Problem 2.3.9. Let P be any probability measure on (R”, @(R")). Prove that, 
given any £ > 0 and any B € &(k"), one can find a compact set A, and an open 
set A> so that A; C B C A and P(A? \ 41) < e. (This result is used in the proof 
of [ P §2.3, Theorem 3].) 
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Hint. Consider the family 


B e &(k"):We > 0 there is a compact set A; and 


B= an open set A), so that the closure ‘Ap is compact 


and one has A; C B C A and P(A; \ Ai) < € 


and then prove that & constitutes a o-algebra. 


Problem 2.3.10. For a given probability measure P, verify the compatibility of 
the measures {P,}, defined by P- (B) = P(.%(B)) (see (21) and Theorem 4 in 
[P 82.3, 5)). 


Problem 2.3.11. Verify that [ P §2.3, Tables 2 and 3] represent probability distri- 
butions, as claimed. 


Problem 2.3.12. Prove that the system A , introduced in [P §1.2, 3], is a o- 
algebra. 


Problem 2.3.13. Prove that the set function (A), A € oA , introduced in Remark 2 
in [ P §2.3, 1] is a measure. 


Problem 2.3.14. Prove by way of example that if the measure uo, defined on the 
algebra æ, is finitely additive but is not countably additive, then jp cannot be 
extended to a countably additive measure on o(.&). 


Problem 2.3.15. Prove that any finitely additive probability measure, defined on 
some algebra, ./, of subsets of 2, can be extended to a finitely additive probability 
measure defined on all subsets of (2. 


Problem 2.3.16. Let P be any probability measure defined on some o-algebra 7 
that consists of subsets of (2 and suppose that the set C C £2 is chosen so that 
C ¢ F. Prove that the measure P can be extended (countable additivity preserved) 
to a measure on the o-algebra o (F U {C}). 


Problem 2.3.17. Prove that the support, denoted by supp F, of any continuous 
distribution function F must be a perfect set, i.e., a closed set without isolated 
points. (Recall that, for a given cumulative distribution function F defined on R, 
supp F is the smallest closed set G with the property u(R \ G) = 0, where u is the 
measure associated with F—see Sect. A.2.) 

Give an example of a cumulative distribution function F, associated with some 
discrete probability measure on R, for which one has supp F = R, i.e., the support 
of F is the entire real line R. 


Problem 2.3.18. Prove the following fundamental result (see the end of 
[P §2.3, 1]): every distribution function F = F(x) can be expressed in the 
form 

F = Fg + G2 Fare + 03 Fisc 5 


where œ; > 0, a; + a@2 + a3 = 1, and 
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Fa is a discrete distribution function (with jumps p > 0 at the points x+): 
Fa = > Pk; 
{k:xk <x} 


Fbc is an absolutely continuous ditribution function: 


Fae f i fdt 


with density f = f(t), which is non-negative, Borel-measurable and 
Lebesgue-integrable, i.e., JS f(t) dt = 1; 


Fs is a continuous and singular distribution function, i.e., continuous distribution 
function for which the points of increase form a set of Lebesgue measure 0. 


What can be said about the uniqueness of the above decomposition of the distribu- 
tion function F = F(x)? 


Problem 2.3.19. (a) Prove that every real number œw € [0, 1] admits a ternary (i.e., 
base 3) expansion of the form 


where œn € {0,1,2},n > 1. 


(b) Prove that if œ € [0, 1] admits two ternary expansions, @ = X% 3+ and 

/ 
o = J 4, both of which are non-terminating (i.e., X pı |@n| = oo and 
cilo] = co), then one must have wn = œ}, for alln > 1 (uniqueness of 


the non-terminating expansions). 
Notice that non-uniqueness of the ternary expansion is possible for reals œw € 
[0, 1] that admit a terminating expansions of the form œ = )7"_, orem < œ. In 


any such case, the following “canonical” expansion may be chosen: 


— m-l w 2 
1l.If@ = } n= st + aw: Set 


E 
Il 
$ 
3 
| 
3 


2. And if @ = J i & + h, set 


n=l 3n 


ç 
x 
IA 
3 
| 


£ 
I 
S 
= 
I 
3 
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(c) Suppose that the set .” C [0,1] comprises all points of increase for the 
Cantor function on the interval [0, 1] (recall that the Cantor function is a canonical 
example of a distribution function which is both continuous and singular—see 
[P §2.3, 1]). Prove that every w € “M admits an expansion of the form 


where œn € {0,2},n > 1. 


Remark. It is interesting that (see [132]) if decimal expansions for the numbers 
w E N are considered, then there will be precisely 14 numbers in the Cantor set 
AV that admit terminating expansions. These numbers are 


13 1 3 7 9.1 3 9 13 27 31 37 39 
4’ 4° 10’ 10° 10’ 10° 40° 40° 40° 40° 40’ 40° 40° 40° 


Problem 2.3.20. Let .” denote the Cantor set inside the interval [0, 1]. 

(a) Prove that ./ has the same cardinality as the set [0, 1]. 

(b) Describe the sets that can be identified with YW @ VW and N © M, i.e., the 
sets {@ +o’: w EN, E V}and{w—a': we N, o ENY. 


Problem 2.3.21. Let C be any closed subset of the real line R. Give an example of 
a distribution function F, for which one can claim that the support of F is precisely 
the set C, i.e., supp F = C. 


Problem 2.3.22. Give an example of a o-finite measure u which is defined on 
(R, A(R)) and 

(a) is not a Lebesgue—Stieltjes measure, in other words, one cannot find a non- 
decreasing and right-continuous function G = G(x) (i.e., a generalized distribution 
function) with the property u((a, b]) = G(b) — G(a), a < b; 

(b) is not a locally finite measure, in other words, every open neighborhood of 
every point x € R has infinite measure. 


Problem 2.3.23. Find a subset of the interval [0, 1], which does not belong to the 
collection of all Lebesgue-measurable sets A((0, 1])—see [ P §2.3, 1]. 


Problem 2.3.24. Give a probabilistic proof of Euler’s product formula for the 
Riemann zeta function; namely, consider the Riemann zeta function €(@) = 
peak 4, 1 < a@ < œ, and prove that the following representation, in which 


Pı, P2, . . . is the sequence of all prime numbers greater than 1, is in force: 
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Hint. Let N = {1,2,...} be the set of all natural numbers endowed with the 
o-algebra (2) comprised of all possible subsets of the (countable) set N—see the 
notation for N(@) at the end of [P §1.1, 3]. Then define a probability measure P 
on (N, 2N) so that, given any A C N, one has 


P(A) = E Y n. 


nEA 


Let A(p;) = {pi,2pi,...} denote the collection of those numbers n € N for 
which p; is a factor (i.e., a divisior) of n. Prove that 

(a) P(A(pi)) = pS 

(b) The events A(p1), A(p2),... are independent; 

© NZ, Ap) = {1}. a 

Furthermore, argue that, since (b) implies that the events A(pı), A(p2),... are 
also independent, then (c) and (a) imply that 


P( N Acv) = PEI) = EO 


i=l 


and, at the same time, that 


P( Maw») = Il [1 — P(A(p;))] = Il (: ~ a 


i=1 i=l i=l 
which completes the proof. 


Problem 2.3.25. Give a probabilistic proof of Euler’s product formula for Euler’s 
totient function y(n), which, for any n € N, gives the total number of positive 
integers p that do not exceed n and are also relatively prime to n (i.e., the only 
common divisor of n and p is 1); namely, by using probabilistic reasoning, prove 


that 
p(n) 1 
—=]|[{1--], 
n | ( ;| 
pin 


where the product is taken over all prime numbers p that divide n (i.e., all prime 
numbers p that are factors of n). 

Hint. Consider the usual uniform probability distribution P({k}) = 1/n, 1 < 
k < n on the set {1,...,”}. For a fixed n € N, let A(p) = {k < n 
p is a factor of k} and let pi, p2,... denote the (distinct) prime numbers that divide 
n. Prove that: 

(a) P(A(p;)) = p;'. 

(b) That the events A(p;), A(p2),... are independent. 

(c) That [| aly A(p) can be identified with the event that k < n is a prime number. 
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Then argue that 


om” P(T) =[]0-Pueyl =] (- Z). 


pin pin pin 


Problem 2.3.26. Prove that the Lebesgue measure on (R”, A(R")) is invariant 
under translation when n > 1, and invariant under rotation when n > 2. 


Problem 2.3.27. In the context of Carathéodory’s theorem, prove by way of 
example that the requirement for the system of sets æ to be an algebra is essential 
for both the existence and the uniqueness of the extension of the probability measure 
P, originally defined on æ, to a measure defined on the o-algebra F = o(/). 
Specifically: 

(a) Construct a sample space 22, two systems of subsets of 2, & and F, such 
that £ is not an algebra and ¥ = o(&), and then construct a probability measure 
P, defined on &, which cannot be extended to a probability measure on the o- 
algebra F. 

(b) Construct a sample space (2, two systems of subsets of 2, £ and F with 
F = o(€), and also two distinct probability measures, P and Q, defined on the 
o-algebra F, such that their restrictions to &, i.e., the measures P|& and Q|\4—see 
[P §2.3, 4]—coincide. 

Hint. To prove (a), consider the sample space §2 = {1, 2, 3}, set 


E = {Ø, {1}, t1, 2}, {1,3}, 2}, 


define ¥ to be the o-algebra of all subsets of (2, and, finally, set P(2) = 
P({1,2}) = P({1, 3}) = 1, PAL) = 1/2, P(Ø) = 0. 
To prove (b), consider the sample space 2 = {1, 2, 3, 4}, set 


& = {{1, 2}, {1,35}, 


define .F to be the o-algebra of all subsets of 2 and, finally, define P({2}) = 
P({4}) = 1/2, Q({2}) = Q({3}) = 1/2. 


Problem 2.3.28. Let F = F(x), x € R, be any distribution function. Prove that 
for any a > 0 one has 


[tre +a) - Fax =a. 
R 


Problem 2.3.29. The density of a given distribution function F (x) is defined as any 
non-negative and Riemann integrable function f(x), x € R, for which one can write 
F(x) = dss F(t) dt, for all x € R. Sometimes (say, when integrals are considered 
only in the sense of Riemann—not in the sense of Lebesgue) one does not suppose 
that the function f(x) is Borel measurable. 
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Give an example of a function f(x) which is not Borel measurable, but 
nevertheless represents a probability density (in the sense described above) and 
defines a probability measure on the Borel subsets B of the real line R, according to 
the formula u(B) = f°, f(x)Ip(x) dx. 

Hint. The collection of all Borel subsets of the interval [0, 1] has cardinality c, 
i.e., the cardinality of the continuum, while the cardinality of the Lebesgue subsets 
of [0, 1] is 2° (see Problem 2.2.32). By using this fact conclude that if .” denotes the 
Cantor set inside the interval [0, 1], then one can find a subset D C [1/2,1]N M, 
which is not a Borel-measurable set and has a Lebesgue measure 0. Then convince 
yourself that the function f(x) = 2J)1/2,1)\p(x), not being Borel measurable, is 
actually Riemann integrable and the integral f f(x)Ig(x) dx is well defined and 
gives rise to a probability measure on the Borel sets B C [0, 1]. 


Problem 2.3.30. Find two sets, A and B, inside the real line R, which have 
Lebesgue measure equal to 0, and yet have the property A @ B = R. 


Problem 2.3.31. Let / be any o-algebra of subsets of the set 2 and let F = 
o(@). Let u be any o-finite measure on F. Prove that: 

(a) The measure u may not be o-finite on Z. 

(b) If the measure u is o-finite on æ then the analog of the property stated in 
Problem 2.3.8 still holds, i.e., for every € > 0 and every B € F with u(B) <co 
one can find a subset A, € & such that w(A; A B) < e. 

(c) If the measure jx is not o-finite on æ, then the claim made in b) may be false. 


Problem 2.3.32. Prove that a probability measure jz defined on (R“, A(R“)) is 
always regular, in the sense that for any Borel set B € A(R“)) one has 


u(B) = infi u(U):U 2 B, U is an open set}, 


and p(B) = sup{u(F): F C B, F is a closed set}. 
F 


In addition, prove that the following relation holds for any Borel set B € A(R“)): 


u(B) = sup{ u(K): K C B, K is a compact set}. 
K 


Problem 2.3.33. (Bertrand’s Paradox.) The well known Bertrand’s Paradox is a 
good illustration of the fact that in many probabilistic models (in particular, models 
involving geometric probabilities, which the paradox is concerned with) one must 
be careful in the formulation of the model and in giving meaning to phrases like 
“a randomly chosen point,’ “a randomly chosen figure,” etc. (This was already 
discussed in Problem 1.1.12.) 

The problem, found in Bertrand’s book [7], and its contradicting answers 
(whence the term“paradox’’), found by using different calculation methods, were 
understood to imply that in random experiments with infinitely many outcomes there 
are events to which it is impossible to assign probabilities in a meaningful way. 
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Bertrand’s problem may be stated as follows: Suppose that a chord AB, with end- 
points A and B, is chosen at random on a circle of radius r. What is the probability 
that the length |AB| of the (random) chord AB is smaller than the radius r? 

Consider the following three possible formulations of this problem: 

(a) The phrase “the chord AB is chosen at random” is understood to mean that 
the points A and B are sampled independently from the uniform distribution on the 
circle. 

Prove that in this case Pa{|AB| < r} = 1/3. (Fix the point A and consider the 
regular hexagon inscribed in the circle so that one of its vertices coinsides with A.) 

(b) Every chord AB is uniquely determined by the point M € AB, chosen so that 
OM L AB, O being the center of the circle. The phrase “the chord AB is chosen 
at random” is understood to mean that the point M is sampled from the uniform 
distribution on the disc (surrounded by the circle). 

Prove that in this case Pp{|AB| < r} = 1/4. (Convince yourself that the event 
{|AB| < r} is the same as the event that M belongs to the ring surrounded by the 
circle of radius r and radius r 3/4.) 

(c) As the length of the chord AB is determined by its distance to the center of 
the circle and not by its position on the circle, one may suppose that AB is parellel 
to the horizontal diameter CD, while the random point M € AB, defined as the 
intersection between AB and the vertical diameter EF (which is perpendicular to 
CD) is uniformly distributed on EF. 

Prove that in this case 

P.{|AB| <r} =1- = (0.13). 
(One must prove that the event {|AB| < r} coincides with the event {OM| > 
/3r/2}, where |OM | is the distance between M and the center, O, of the circle.) 


Problem 2.3.34. (Continuation of Problem 2.3.33.) Argue that the situation de- 
scribed in Problem 2.3.33 can actually be connected to three different problems. 
More specifically, let ọ = |OM |, where O is the center of the circle and the point 
M is defined as in part (b) in the previous problem, and let 0 denote the angle 
between the chord AB and some fixed direction, so that, assuming that r = 1, fora 
chord with |AB| > 0 one must have 0 < p < 1,0 < 0 < 2x. 

Prove that in parts (a), (b) and (c) in the previous problem the joint distribution 
of (p, 0) is given, respectively, by the densities 


1 p 1 
alp, 0) = ———, 0j, <(p, 0) = —. 
Palp, 0) mi Po(p, 0) = Dc(p, 0) aa 


Consequently, there is no “paradox”, as the phrase “the chord AB is chosen at 
random” is given a completely different probabilistic meaning in parts (a), (b) 
and (c). 
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Problem 2.3.35. Let (X, X, u) be the space associated with some measurable 
structure (X, 2°) and some countably additive measure jz (see [ P §2.3, Definitions 
5 and 6]). 

A measurable set A is said to be an atom relative to the measure u, or, 
equivalently, a jz-atom, if u(A) > 0 and for every measurable set B one has either 
u(A N B) = 0 or L(A \ B) = 0. The measure u is said to be atomic, if every 
measurable set with a positive jz-measure contains an atom. 

The measure p is said to be non-atomic if no j1-atoms exist. 

The measure u is said to be a diffusion measure if every one-point set is a 
measurable jz-null set. 

Give examples of atomic, non-atomic and diffusion measures and also an 
example of a measure which is simultaneously an atomic measure and a diffusion 
measure. 

Prove that the sum of an atomic and a non-atomic measure may be an atomic 
measure. 


Problem 2.3.36. Let P and P be any two probability measures on (2, .F) such that 
P(A) = P(A), for any A € F with P(A) < 1/2. Prove that when this condition is 
satisfied then P(A) = P(A) for every set A E€ F. 


2.4 Random Variables I 


Problem 2.4.1. Prove that the random variable € has a continuous distribution, or, 
“E is continuous” for short, if and only if P{é = x} = 0 for any x € R. 


Problem 2.4.2. Can one claim that if |&| is #-measurable then € also must be F- 
measurable? 


Problem 2.4.3. Prove that x”, x* = max(x, 0), x7 = —min(x, 0) and |x| = x" + 
x` are all Borel functions of x. Prove that the following more general statement: 
every continuous function f = f(x), x € R is Borel measurable. 

Hint. Given any a € R, consider the open set {w € R : f(w) < a} and use the 
result established in Problem 2.2.7. 


Problem 2.4.4. Prove that if £ and 7 are F -measurable then 
to: E(w) = no) E F. 


Problem 2.4.5. Let & and 7 be any two random variables on (2, F) and let A € F. 
Then the function 


tlw) = Elw) + nœ) 
also must be a random variable. 


Problem 2.4.6. Let &,...,&, be any > 1 random variables and let y(x1,..., Xn) 
be any Borel-measurable function. Prove that g(&(@),...,&(@)) is a random 
variable. 
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Hint. Show first that the map 


on (€1(@), tee »En(@)) e R” 


is F /A(R")-measurable. Then use the fact that the map w ~ @(&1(@),..., &:(@)) 
is a composition of measurable maps. 


Problem 2.4.7. Let & and 7 be any two random variables with values in the 
Set {le N} and suppose that Fg = F,. Prove that there is a permutation 
(i;,...,7iy) of the set (1,..., N) for which one can claim that for any j = 1,...,N 
the sets {œw: E = j} and {w:n = ij} coincide. 

Hint. Consider using [P §2.4, Theorem 3], according to which there are 
functions g and y such that £ = y(n) and n = y(é). Then argue that i; = w(/) 
gives the desired permutation. 


Problem 2.4.8. Give an example of a random variable & that admits a probability 
density f(x) such that lim,—9 f(x) does not exist and, therefore, the function 
f(x) does not vanish at infinity. 


Problem 2.4.9. Let £ and 7 be any two bounded random variables with |&| < c1, 
|n| < c2. Prove that if 


EE” n” = EE” A En”, 
for any m,n > 1, then € and 7 must be independent. 


Problem 2.4.10. Let £ and 7 be any two random variables whose distribution 
functions, F; and F, coincide. Prove that if x € R and {w:&(w) = x} # Ø, 
then there is a real number y € R such that {w: E(w) = x} = {w: n(@) = y}. 


Problem 2.4.11. Let E be any at most countable subset of R and consider the map 
E: Q > E. Prove that £ is a random variable on (2, F) if and only if {w:&(@) = 
x} e F forany x € E. 


Problem 2.4.12. Let £ be any random variable with the property P{Eé # 0} > 0. 
Suppose that for some a and b the random variables a€ and bé have one and the 
same distribution, i.e., Fag (x) = Fpg(x), x € R. Can one claim that this is possible 
only if a = b? Does the assumption a > 0 and b > 0 change the answer to the last 
question? 


Problem 2.4.13. Let (2,.#%,P) be any probability space and let (2, F, P) be 
its completion relative to the measure P (see Problem 2.2.34 and Remark 1 
in [P §2.3, 1]). Prove that, given any random variable E = Elw) on (2, F", P), 
it is always possible to find a random variable £ = (œw), defined on (2, F, P), 
for which one can claim that P{f€é Æ £} = 0, i.e., E and £ differ only on a set with 
probability 0. 


Problem 2.4.14. Let £ be any random variable and B be any Borel set in R. 
Prove that 


o(§1(E € B)) = & '(B) Nofé). 
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Problem 2.4.15. Let £, &,... be any sequence of independent random variables 
every one of which is uniformly distributed in the interval [0, 1]. Given any w € &, 
consider the set A(w) C [0, 1] which consists of all values £ (œw), & (œ), .... Prove 
that for almost every w € 92 one can claim that the set A(w) is everywhere dense 
in [0, 1]. 


Problem 2.4.16. Let &,&,... be any sequence of Bernoulli random variables, 
such that P{é& = 1} = P{& = —1} = 1/2,k > 1. Consider the random walk 
S = (Sn)n>o, defined by So = Oand S, = & +... + En, forn > 1. 


Let oo = inf{n > 0 : Sa = O} be the first moment (after n = 0) at 
which the random walk returns to 0, with the understanding that o9 = oo if 
{n > 0: S, = 0} = Ø. 

Prove that 


Ploy > 2n} = Ci, (3) and Poo = 2n) = Ch (3) 


By using Stirling’s formula, argue that for large n one has 


1 
P{op > 2n} ~ ee and P{oo9 = 2n} ~ 


(comp. with the formulas for uz and f>, given in [ P §1.10]). For example, the above 
formulas imply that P{a) < oo} = 1 and Eo < oo if and only if a < 1/2—see 
[P §1.9] for related results. 


1 
24/7 n3/2 


Problem 2.4.17. In the context of the previous problem, let op = inf{n > 1: S, = 
k},k =1,2,... Prove that 


Plo = n) = ÉP{S, = k} 


and conclude that 
k rk yl” 
Plo, =n} = —C, 2 (5) l 

n 


Problem 2.4.18. Let E = &(w) be any non-degenerate random variable, such that, 
with some constants a > 0 and b, the distribution of a£ + b coincides with the 
distribution of £. Prove that this is only possible if a = 1 and b = 0. 


Problem 2.4.19. Let &; and & be any two exchangeable random variables, i.e., £1 
and & are such that the distribution law of (&1, &) coincides with the distribution 
law of (&, £1). Prove that if f = f(x) and g = g(x) are any two non-negative and 
non-decreasing functions, then 


Ef(Eg(E1) = Ef (Eg (&). 
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Problem 2.4.20. (On [ P §2.4, Theorem 2].) Let £, &,... be any sequence of real- 
valued random variables. Prove that 


B := {o : lim é, (œ) exists and is finite} € F. 
Hint. Use the fact that B may be expressed as: 


B = {lim £n > —œ0} N {im£, < oo} N {lim £, — lim &, = O}. 


Problem 2.4.21. Let &,&,... be any sequence of independent and identically 
distributed random variables that share the same continuous distribution function. 
Let Aj, Ao,... be any sequence of events, such that A; = 92 and 


An = {En > Em for allm < n}, n> 2, 


i.e., Án is the event that a “record” occurs at time n. Prove that the events A1, Ao,... 
are independent and that P(A,,) = 1/n,n > 1. 


Problem 2.4.22. Let & and 7 be any two random variables, such that Law(7), i.e., 
the distribution law of 7, is absolutely continuous (in the sense that the associated 
distribution function F, is absolutely continuous). Prove that: 
(a) If £ and ņ are independent, then Law(& + 7) is also absolutely continuous. 
(b) If £ and 7 are not independent, then Law(€ + 7) may not be absolutely 
continuous. 


Problem 2.4.23. Let £ and ņ be any two random variables, such that € is discrete 
and 77 is singular, i.e., Fẹ is a discrete distribution function and F, is a singular 
distribution function. Prove that the distribution function F;,, associated with the 
random variable € + n, is singular. 


Problem 2.4.24. Let (2, .#) be any measurable space, such that the o-algebra .F 
is (countably) generated by some partition 2 = {D,, Do,...} (see [P §2.2, 1]). 
Prove that the o-algebra F can be identified with the o-algebra Fy, generated by 
the random variable 


xio) = Y C 


10” 
n=1 


where g(0) = 3 and g(1) = 5. 


Problem 2.4.25. (a) Suppose that the random variable X has a symmetric distribu- 
tion, i.e., Law(X) = Law(—X). Prove that Law(X) = Law(&€Y), where € and Y 
are independent random variables, such that P{§ = 1} = P{€ = —1} = 1/2 and 
Law(Y) = Law(|X]). 

(b) Suppose that £ and Y are two independent random variables and that 
P{é = 1} = P{é = —1} = 1/2. Prove that £ and £Y are independent if and only if 
Y has a symmetric distribution, i.e., Law(Y) = Law(—Y). 
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Problem 2.4.26. Suppose that the random variable X takes only two values, xı and 
X2, X1 Æ X2, and that the random variable Y also takes only two values yı and y2, 
yı # y2. Prove that if cov(X, Y) = 0 then X and Y must be independent. 


Problem 2.4.27. Suppose that &, &,... are independent and identically distributed 
random variables, all being uniformly distributed in the interval [0, 1]. Given any 
0 <x < l, set 

t(x) = min{n > 1:& +- + én > x}. 


Prove that P{t(x) > n} = x"/n!,n > 1. 


Problem 2.4.28. Suppose that X;, X2 and X3 are independent and identically 
distributed random variables with exponential density f(x) = e~* I(x > 0). Define 
the random variables 


Xı X, + X2 


n=, h=- 2 
Xi +X Xı + X2 + X3 


and Y; = Xı + X2 + X3. 


Prove that the above random variables, Y1, Y) and Y3 are independent. 


Problem 2.4.29. Suppose that X; and X% are independent random variables, 
both having a y?-distribution, respectively, with r; and rz degrees of freedom 
(see formula [ P §2.8, (34)], or [P §2.3, Table 3]). Prove that the random variables 
Yı = X,/X2 and Y = X, + X2 are independent (comp. with the statements of 
Problems 2.13.34 and 2.13.39). 


2.5 Random Elements 


Problem 2.5.1. Let &,...,&, be any family of n discrete random variables. Prove 
that these random variables are independent if and only if for every choice of the 
real numbers x,,...,X, one has 


Pi = Fist = Xay = | [Pt = x}. 


i=l 


Problem 2.5.2. Give a complete proof of the fact that every random function 
X(@) = (é&(@))er is a random process in the sense of [P §2.5, Definition 3] 
and vice versa. 

Hint. If X = X() is a F /Z(RT)-measurable function, then for every t € T 
and B € @(R) one has 


{w : Elw) € B} = {w: X(@) E Che F, where C = {x € RT : x, € B}. 


Conversely, it is enough to consider sets C € Z(RT) of the form {x : x, € 
Bi,..., X € Bn}, Bi,..., Bn € A(R), which, obviously, belong to F. 
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Problem 2.5.3. Let X;,...,X, be random elements with values, respectively, in 

(E1, 61), ..., (En, En). Furthermore, suppose that (E1, 6/),...,(E),, £) are mea- 

surable spaces and that g1,..., gn are, respectively, &)/é/,..., &,/&/-measurable 

functions. Prove that if X;,...,X, are independent, then the random elements 

gi0oX1,..., Y,X, also must be independent, where g;o X; = g; (X;) i = 1,...,n. 
Hint. It is enough to notice that for any B; € &, i = 1,...,n, one has 


P{gi(X1) (= By,..., &n(Xn) = Bn} _ P{X, E€ gi (Bi), sexe Xn € oi. 


Problem 2.5.4. Let X1, X2,... be any infinite sequence of exchangeable random 
variables, i.e., the joint distribution of any k elements of the sequence with distinct 
indices, say, X;,,...,X;,, depends on k but not on the choice or the order of 
the indices i,,...,i,—-comp. with the definition in Problem 2.1.11. Prove that if 
EX? <oo,n > 1, then the covariance of X; and X3 satisfies cov(X1, X2) > 0. 


Hint. Using the exchangeability, write the variance D( pares :) in terms of the 
first two moments and the covariances and then take the limit as n —> oo. 


Problem 2.5.5. Let &,...,&» and 1,...,1, be any two (arbitrarily chosen) sets 
of random variables. Define the vectors X = (&1,..., Em) and Y = (m,..., Nn) 
and suppose that the following conditions are satisfied: 


(i) the random variables &,,..., En are independent; 
(ii) the random variables 7;,..., Nn are independent; 
(iii) the random vectors X and Y, treated as (random) elements of, respectively, 
R” and R” are independent. 


Prove that the random variables &1,... , Em, 11, . - - , Nn are independent. 

Problem 2.5.6. Consider the random vectors X = (é1,..., Em) and Y = 
(71,---,%n) and suppose that their components &i,..., Em, N1,.--, Nn are 
independent. 


(a) Prove that the random vectors X and Y, treated as random elements, are 
independent (comp. with Problem 2.5.5). 

(b) Let f: R” — R be g: R” — R be two Borel functions. Prove that the random 
variables f (&1,..., En) and g(71,..., Nn) are independent. 


Problem 2.5.7. Suppose that (2, F) is a measurable space and let (E, &, p) be 
a metric space endowed with metric p and a Borel o-algebra £, associated with 
the metric p—see [P §2.2]. Let Xı (w), X2(w),... be some sequence of ¥/&- 
measurable functions (i.e., random elements), such that for any w € & the limit 


X(o) = lim Xn() 


exists. Prove that the limit X(w), treated as a function of œ € Q, must be ¥/&- 
measurable. 
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Problem 2.5.8. Let &,&,... be any sequence of independent and identically 
distributed random variables, let F, = o(&,&,...), n > 1, and let t be any 


stopping time (relative to (.F;,),>1). Set 


Nn (@) = Entro) (0). 


Prove that the sequence (7,72,...) has the same distribution as the sequence 


(1, &, oe .). 


2.6 The Lebesgue Integral: Expectation 


Problem 2.6.1. Prove that the representation in [ P §2.6, (6)] is indeed in force. 

Hint. Let S denote the space of simple functions s. If s E€ {s E€ S:s < &} 
and if (&,),>1 is some sequence of simple random variables such that & ¢ &, 
then max(é,,5) t E and Es < Emax(&,,s). From the last inequality one can 
conclude that Es < E€ and that sup;,cs.,<¢; Es < E&. The opposite inequality 
follows directly from the construction of E£. 


Problem 2.6.2. Verify the following generalization of property E, described in 
[P §2.6, 3]. Suppose that £ and ņ are two random variables for which the 
expectations EE and En are well defined and the expression EE + En is meaningful, 
in the sense that it does not have the form oo — oo or the form —oo + oo. Then one 
can write 

E(é + n) = EE + En. 


Hint. Just as in the proof of property E, one must consider the infinities arrising 
from the representations £ = £t —E~ and n = n* — n. If, for example, EET = oo, 
then, by using the assumptions in the problem, one can prove by contradiction that 
EE +m)" = 00. 


Problem 2.6.3. Generalize property G in [P §2.6, 3] by showing that if E = n 
(a. e.) and E£ is well defined, then Eņ also well defined and En = E£. 


Problem 2.6.4. Let & be any extended random variable and let jz be any o-finite 
measure with the property fo |é|ļdu < oo. Prove that || < 00 (m-a. e.). (Comp. 
with Property J.) 


Problem 2.6.5. Suppose that jz is some o-finite measure and that € and ņ are 
extended random variables for which f & dy and f nd are well defined. Prove 
that if one can claim that f,€dy < f} ndy for any set A € F, then one can also 
claim that E < n (m-a. e.). (Comp. with property I.) 


Problem 2.6.6. Assuming that € and 7 are two independent random variables, 
prove that E€n = E£ - En. 
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Hint. Instead of £ and 7 consider the simple random variables &, and 7,,, chosen 
so that &, ¢ & and n, f n. According to [P §2.6, Theorem 6] one must have 
E£ n, = E&,En,. The proof can be completed by using the monotone convergence 
theorem. 


Problem 2.6.7. By using Fatou’s lemma prove that 


P(lim A,) <limP(A,), P(lim A,) < lim P(An). 


Problem 2.6.8. Construct an example that proves that, in general, in the dominated 
convergence theorem one cannot relax the condition “|&,| < n, En < oo”. 

Hint. Let Q = [0,1], let F = A([0,1]), suppose that P is the Lebesgue 
measure on [0, 1], and then consider the random variables &,(@) = —nI(w < 1/n), 
n>l. 


Problem 2.6.9. By way of example, prove that, in general, in the dominated 
convergence theorem one cannot remove the condition “E, < , Ey > —oo”. 


Problem 2.6.10. Prove the following variant of Fatou’s lemma: if the family of 
random variables {£} ,n > 1} is uniformly integrable, then 


lim Eé, = Elim &,. 


Hint. Use the fact that for any € > O one can find some c > 0 such that 
Eé I(E, > c) < e, foralln > 1. 


Problem 2.6.11. The Dirichlet function is given by 


1, is rational 

ie x is rationa -gei 
0, <x is irrational 
This function is Lebesgue-integrable (on [0,1]), but is not Riemann-integrable. 
Why? 
Problem 2.6.12. Give an example of a sequence of Riemann-integrable functions 
(fn)n>1, Which are defined on [0, 1] and are such that | f,| < 1, fa —> f Lebesgue- 
almost everywhere, and yet the limit f is not Riemann-integrable. 

Hint. Consider the function f,(x) = 7}, [g,3(x), where {q1,q2,...} is the 
set of all rational numbers in [0, 1]. 


Problem 2.6.13. Let {aj;;;i, 7 = 1} be any sequence of real numbers with 
vi |a;;| < oo. By using Fubini’s theorem, prove that 


day = (Nav) = ay). œ) 


Gj) i J J i 
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Hint. Consider an arbitrary sequence of positive numbers pı, p2,... with 
> Pi = 1 and define the probability measure P on 2 = N = {1,2,...} 
according to the formula P(A) = J;e; pi. Then define the function f(i, j) = 


ha 
—_. observe that 
Pi Pj 


[,_evenld® xP) = DIE Dipp = Y layl <o, 
x A i,j 


and use Fubini’s theorem. 


Problem 2.6.14. Give an example of a sequence (a;j;i, j = 1) for which 
Èj |a;;| = 00, but the second identity in (*) (Problem 2.6.13) does not hold. 
Hint. Consider the sequence 


0, i=j 
aij => š 
G-j)>°, i#j 
Problem 2.6.15. Starting with simple functions and using the results concerning 
the passage to the limit under the Lebesgue integral, prove the following version of 
the change of variables theorem. 

Let h = h(y) be any non-decreasing and continuously differentiable function 
defined on the interval [a, b] and let f (x) be any integrable (relative to the standard 
Lebesgue measure dx) function on the interval [h(a),h(b)]. Then the function 
f(h(y))h' (y) is Lebesgue-integrable on the interval [a, b] and 


h(b) b 

f| feax= f FOWO. 
h(a) a 

Hint. First prove the result for functions f that can be written as finite linear 

combinations of indicators of Borel sets. By using the monotone convergence 

theorem then extend the result for all non-negative functions f and, finally, prove 

the result for arbitrary functions f by using the usual representation f = f+ — f7. 


Problem 2.6.16. Verify formula [ P §2.6, (70)]. 

Hint. Consider the random variable £ = —&, which has a distribution function 
F(x) = | — F((—x)-), and notice that i. |x|" dF(x) = Pa x d F(x). Use 
formula [ P §2.6, (69)]. 


Problem 2.6.17. Let &,&,&,... be any sequence of non-negative random 
variables that converges in probability P to the random variable €, i.e., 


P(E — E| > £) > 0, n — œ (notation: £n 5 &—see [P §2.10]). 

(a) Generalize [ P §2.6, Theorem 5] by showing that if E£, < 00, n > 1, then the 
following claim can be made: E£, — EE < oo if and only if the family {n,n > 1} 
is uniformly integrable; in other words, the statement of Theorem 5 remains valid if 
the convergence with probability 1 is replaced by convergence in probability. 
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(b) Prove that if all random variables £, &, &,... are integrable, i.e., EE < oo 
and Eg, < œo, n > 1, then 


E$, => Ee = E\é,, —&§| > 0. 


Hint. (a) The sufficiency follows from [ P §2.6, Theorem 4] and Problem 2.10.1. 
The necessity can be established, as in [P §2.6, Theorem 5], by replacing the 
almost everywhere convergence with convergence in probability (one must again 
use Problem 2.10.1). 

(b) Given any c > 0 one has 


EJE —& | < E| — (E ^ c)| + EME Ac) — (& Ac) + El(E Ac) — El. 


By keeping £ > 0 fixed and by choosing c > 0 so that EJE — (E A c)| < £, one can 
claim (due to the assumptions) that E|(EAc)—(&, Ac)| < e and E|(—, Ac)—&,| < 3e, 
for all sufficiently large n. Consequently, EJE — £| < 5e, for all sufficiently large n. 


Problem 2.6.18. Let € be any integrable random variable, i.e., E|E| < oo. 

(a) Prove that for any € > 0 one can find some ô > 0 with the property that for 
any A € F with P(A) < ô one has El 4|&| < e (absolute continuity property of the 
Lebesgue integral). 

(b) Conclude from (a) that if (An)n>1 is some sequence of events for which 
lim, P(A,) = 0, then E(€7(A,)) — 0, asn — oo. Hint. Use Lemma 2 in 
[P §2.6, 5]. 


Remark. Comp. with (b) from [P §2.6, Theorem 3]. 
Problem 2.6.19. Suppose that the random variables €,7,€ and En, Nn, Cno n > 1, 


P 
are such that (see the definition of convergence in probability —> in Problem 2.6.17) 


P P P 
En > E, N > N, En > ¢, fn = bn Son, n= 1, 


E¢, > Ef, En, > En, 


and the expectations E£, En, E¢ are all finite. Prove the following result known as 
Pratt’s lemma: 

(a) Eé, — E£. 

(b) If, in addition, n, < 0 < ¢,, then Eļ|é, — &| > 0. 
Conclude that if &, 5 &, Ejn | > Eļé| and EļE| < ov, then E|é, — £| > 0. 

Give an example showing that if condition (b) is removed then it is possible that 


Elé, — E| A 0. 
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Hint. For the random variables 7, = 0, En = En — Mh, a = n — n and 


J= 0, E =£- n, E = €—n, one has 0 < Ca kia ¢ and E¢, — E¢. According 
to part (a) in Problem 2.6.17 the family {¢,, > 1} is uniformly integrable and, 
since 0 < E, < €,, the family {En n > 1} also must be uniformly integrable. 
Consequently, one can claim that E&,, —> E£ (and even that E|&,, —E| —> 0). Because 
of the assumption En, — En, it follows that E&, —> E£. 


Problem 2.6.20. Prove that L. f < L* f and, if the function f is bounded and the 
measure p is finite, then L f = L* f (see Remark 2 in [P §2.6, 11]). 


Problem 2.6.21. Prove that for any bounded function f one has Ef = L. f (see 
Remark 2 in [ P §2.6, 11 ]). 


Problem 2.6.22. Prove the final statement in Remark 2 in [P §2.6, 11]. 


Problem 2.6.23. Let F = F(x) be the distribution function of the random 
variable X. Prove that: 


0 oo 
(a) E|X]|< œ = | F(x) dx < oo and f (1 — F(x)) dx < o0; 
=00 0 


b) EXt<co <=> fom iFa wt < oo for some a. 


Hint. (b) Verify the following inequality 


EXX > a] < [In dx < ZO ETC > o) a>0. 


Problem 2.6.24. Prove that if p > 0 and limy..9x?P{|&| > x} = 0, then 
E|&|" < oo for all r < p. Give an example showing that if r = p then one can 
have E|é|" = oo. 


Problem 2.6.25. Give an example of a probability density f(x), which is not an 
even function, but nevertheless all odd moments vanish, i.e., JSS x* f(x)dx = 0, 
R= L3 


Problem 2.6.26. Give an example of a sequence of random variables &,, n > 1, 
that has the following property: 


EŞ i # >) Eh. 


n=l n=l 
Problem 2.6.27. Suppose that the random variable X is such that for any a > 1 
one has 
P{|X| > an} 
P{|X| >n} 


Prove that then X admits finite moments of all orders. 


>0 asno-ow. 
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CO 
E|x|* = vf a PU X| > x)dx, N>1. 
0 


Problem 2.6.28. Let X be any random variable that takes the values k = 0,1, 
2,... with probabilities py. The function G(s) = Es* (= pan pxs*), |s| < 1, is 
known as the generating function of the random variable X (see Sect. A.3). Verify 
the following formulas: 

(a) If X is a Poisson random variable, i.e., pe = e*A*/k!,k = 0,1,2,...,, for 
some A > 0, then 

G(s) = Ee¥ = e0, [5] < 1. 

(b) If the random variable X has a geometric distribution, i.e., if py = pq* 

k =0,1,2,..., for some 0 < p < 1 andq = 1 — p, then 


G(s) =", Js|<1. 
l—sq 


(c) If X;,..., Xn are independent and identically distributed random variables 
with P{X, = 1} = p, P{X,; = 0} = p (q = 1 — p), then 


G(s) = (ps +q)" (= Dci oes ‘) 


and, consequently, P{X, +... + X, = k} = CK p*q"—. 


Problem 2.6.29. Let X be any random variable that takes values in the set 
{0,1,2,...} and let G(s) = +, pxs*, where py = P{X = k}, k > 0. Assuming 
that r > 1, prove that: 

(a) If EX” < oo, then the factorial moment E(X), = EX(X—1)...(X—r-+1) is 
finite and E(X), = lim,.; GO (s) (= GO (1)), where GO (s) is the r-th derivative 
of G(s). 

(b) If EX” = œ, then E(X), = œ and lim,_,; GO = oo. 

Problem 2.6.30. Let X be any random vanans which is uniformly distributed in 
the set {0,1,..., n}, ie., P{X = k} = where k = 0,1,...,n. Prove that 


= 1—s”+! 
G(s) T npl l-s 
relations: 


T ? 
and, after computing EX and EX?, establish the following 


_ n(n + 1) * 1) 2 2_ n(n+l1)(n+2) 
yee 4 $k ~~ 6 : 
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Problem 2.6.31. (Continuation of Problem 1.1.13.) Consider the (not-necessarily 
independent) events Aj,..., Ay, let X; = J4,,i = 1,...,n, and let XY, = I4, + 
...+14,. Prove that the generating function Gs, (s) = Es% is given by the formula: 


Gs, (8) = È` Sm(s — 1)”, 


m=0 
where 


Smn= JO P(A +... + Ain) (= > Pirena] 


l<ii<...<im <n l<ii<...<im <n 


(see Problem 1.1.12). Conclude that the probabilities of the events Bn={ X, = m} 
are given by the formula 


P(Bm) = X CDCP Sk. 


k=m 


Hint. Use the relations Gy, (s) = E[]/_,( + Xi(s — 1)) and 


[ [G+ Xi@-D) = 140 XD YO Xi, Xi (s-D? +... +] ] XiG-D". 


i=l i=l 1<i <i <n i=1 


Problem 2.6.32. In addition to the generating functions G(s), it is often useful 
to work with moment generating functions, which are defined as M (s) = Ee*, 
assuming that s is chosen so that Ee’* < oo. Note that if the random variable X 


is non-negative and s = —A, where A > 0, then the function F(A) = M(-A) 
(= Ee~**) is nothing but the Laplace transform of the random variable X with 
c.d.f. F = F(x). 


(a) Prove that if the moment generating function M(s) is defined for all s in 
some neighborhood of the origin (s € [—a, a], a > 0), then all derivatives M ®© (s), 
k =1,2,..., exist at s = 0 and 


M®(0) = EX*. 


This observation justifies the term “moment generating function” in reference to the 
function M (s). 
(b) Give an example of a random variable for which M (s) = oo for every s > 0. 
(c) Prove that if X has a Poisson distribution with 4 > 0 then M (s) = e~*('-@) 
forall s € R. 
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(d) Give an example of two random variables, X and Y, which are not 
independent and, at the same time, My+y(s) = Ees(¥+Y) is the product of the 
moment generating functions My (s) = Ee** and My (s) = Ee”. 


Problem 2.6.33. Prove that if 0 < r < œ, X, € L” and X, >X , then the 
following conditions are equivalent: 


(i) The family {|X,,|", > 1} is uniformly integrable. 
(ii) X, > X in L’. 
(iii) E|X,|" > E|X|" < co. 
Problem 2.6.34. (Spitzer identity.) Let X1, X2,... be independent and identically 
distributed random variables with E|X,| < oo, and let S$; = Xi +---+ Xk, Mk = 


max(0, S1,..., Sk), k > 1. Prove that, for any n > 1, 
n 1 4 
EM, = > x ES; (*) 
k=1 


where S$ = max(0, Sx). 
Hint. By using the relations 


Mn = I(Sn > 0)M, a I(S;, Ss 0)Mn, 


t = 


E[I(S, > 0)M,] = E[U(S, > 0) X1] + El(S, > 0)My—1] 


and E[/(S, > 0)X1] =n 'ES* 


n >? 


one can prove by induction that 


EM, 


II 


1 1 1 
-ESF +EM,-1 = -ES + | est + EM,-2| ee 
n n n-1 


II 


n 1 n 1 
ESE +EM = DU ESS. 
k=2 k=1 


Remark. One can derive (*) by differentiating in t the more general Spitzer 
identity, according to which, for any 0 < u < 1, one has 


z Suk + 
k itM itS 

` Beti ` — Eek $. 
u Ee ap] k e ! 

k=0 k=1 


The proof of the above relation is somewhat more involved than the proof of (*). 


Problem 2.6.35. Let So = 0, Sn = Xı +---+ Xn, n > 1, be a simple random 
walk (see [ P §8.8]) and let o = min {n > 0: S, > 0}. Prove that 


E min(o, 2m) = 2E|Szm| = 4mP{ Sm = 0}, m>0. 
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Problem 2.6.36. (a) Let E be any standard Gaussian random variable (& ~ 
WV (0, 1)). By using integration by parts, prove that EE = (k — 1)E&*~?, k > 2, 
and conclude that, for k > 1, 


E-l —Q and E% =1.3----- (2k —3)-(2k—1) (= (2k -1)!). 
(b) Prove that for any random variable X that has Gamma distribution (see 
[P §2.3, Table 3]) with 6 = 1 one has 


ext = Eto k>1. 
T (æ) 


In particular, EX = a, EX? = a(a@ + 1) and, therefore, DX = a. Find an analog 
of the above formula when £ Æ 1. 
(c) Prove that for any Beta-distributed random variable X (see Table 3 in § 3) 
one must have 
_ B(r+k,s) 


EX = ———, kel. 
B(r,s) 


Problem 2.6.37. Prove that the function 
Elw @2) = e —2e 71", E R = [1, 00), w € R = (0, 1], 


has the following properties: 
(a) for any fixed a2, £ is Lebesgue-integrable in the variable wı € 921; 
(b) for any fixed w1, € is Lebesgue-integrable in the variable œ € (22, 


and yet Fubini’s theorem does not hold. 


Problem 2.6.38. Prove Beppo Levi’s theorem, which claims the following: if the 
random variables &,&,... are integrable (i.e., EJ&,| < oo foral n > 1), if 
sup, Eén < oo, and if &  & for some random variable £, then & is integrable 
and one has E&, + EE (comp. with [ P §2.6, Theorem 1a]). 


Problem 2.6.39. Prove the following variation of Fatou’s lemma: if 0 < & — & 
(P-a.s.) and Eé, < A < œœ,n > 1, then £ must be integrable and EE < A. 


Problem 2.6.40. (On the connection between the Riemann and the Lebesgue 
integrals.) Suppose that the Borel function f = f(x), x € R, is integrable with 
respect to the Lebesgue measure A, i.e., fg | f(x)| A(dx) < 00. Prove that for any 
£ > 0 one can find: 

(a) a step function of the form f.(x) = X`; fila (x), the sets A; being 
bounded intervals, such that fg | f(x) — fe(x)| A(dx) < £; 

(b) an integrable continuous function g,(x) that has bounded support and is such 


that fg | f(x) — ge(x)| A(dx) < e. 
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Problem 2.6.41. Prove that if is any integrable random variable then 


ce= f Pe >xjdr-f PLE < x} dx. 
Show also that for any a > 0 one must have 
Elé I(E > a)] = a P{E > x} dx + aP{E > a} 
and that if £ > 0 then 
EIEI (E < a)] = [ P{x < £ <a}dx. 
Problem 2.6.42. Let £ and 7 be any two integrable random variables. Prove that 
E- En= f Pin <x < 8 PEE < x < ride. 


Problem 2.6.43. Let € be any non-negative random variable (£ > 0) with Laplace 
transform F(A) = Ee™%E A> 0. 
(a) Prove that for any 0 < r < 1 one has 


. oc 1-F@ , 
E =al Arti 


Hint: use the fact that 


1 œ ]— =sÀ 
Era- = f S dh 
r 0 


Arti 


for any s > Oandany0 <r <1. 


(b) Prove that for any r > 0 one has 


-r _ 1 OO a 1/r 
E¢ == / FAM) da. 


Hint: use the fact that 


r = 7 
= aS ani, ar /sy'} ai, 


for any s > 0 and any r > 0. 
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Problem 2.6.44. (a) Prove that in Hélder’s inequality [ P §2.6, (29)] the identity is 
attained if and only if |&|? and |7|? are linearly dependent P-a.e., i.e., one can find 
constants a and b, that are not simultaneously null, for which one has P-a.e. a|&|? = 
b\n\*. 

(b) Prove that in Minkowski’s inequality [ P §2.6, (31)] (with 1 < p < oo) the 
identity is attained if and only if one can find two constants, a and b, that are not 
simultaneously null, for which one has aé = bn, P-a.e.. 

(c) Prove that in Cauchy—Bunyakovsky’s inequality [ P §2.6, (24)] the identity is 
attained if and only if € and 7 are linearly dependent P-a.e., i.e., a& = bn, P-a.e., 
for some constants a and b that are not simultaneously null. 


Problem 2.6.45. Suppose that X is a random variable with P{a < X < b} = 1, 
for some choice of a,b € R, a < b. Setting m = EX and o? = DX, prove 
that o? < (m —a)(b — m), where equality is reached if and only if P{X = a}+ 
P{X = b}=1. 


Problem 2.6.46. Assuming that X is a random variable with E|X| < œ, 


prove that: 
(a) If X > 0 (P-a.s.) then 
1 
— > —, ElnX <InEX, E(xInX)>EX-InEX, 
X — EX 


where we suppose that 0 - In 0 = 0. 
(b) If X takes values only in the interval [a,b], 0 < a < b < o, then 


2 
izexy-eL<@r™ 
X 4ab 


(when do the equalities in the last relation hold?). 

(c) If the random variable X is positive and if EX? < ox, then the following 
lower bound estimate, known as the Paley-Zygmund inequality, holds: for any 0 < 
Azi 
(EX)? 

EX? ` 

(d) By using the above inequality, prove that if P{X < u} < c for some u > 0 

and c > 0, then for every r > 0 one has 


P{X > AEX} > 1-17 


u” 


EX’ 
~ T= (EX?) Er 


provided that the expression in the denominator is well defined and strictly positive. 
(e) Prove that if X is a non-negative integer-valued random variable then 
(EX)? 
EX? ` 


P{X > 0} => 
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Problem 2.6.47. Suppose that £ is a random variable with EE = m and E(é — 
m}? = o°. Prove Cantelli’s inequalities: 


max (P{§ -m > e}, P{Ẹ -m < 8) < 5, e > 0; 
20? 
P{lE —m| 2 8} = are e>0. 


Problem 2.6.48. Suppose that £ is some random variable with E|&| < oo and let 
g = g(x) be any strictly convex function defined on the real line R. Prove that 
Eg(&) = g (E£) if and only if E = E£ (P-a.s.). 


Problem 2.6.49. Let € be any integrable random variable, i.e., EJE] < oo. Prove 
that for any € one can find a simple random variables € > 0 so that EJE — & | < €. 


Problem 2.6.50. Consider the equation 
t 
Z: = Bit) +f Zs—dA(s) 
0 


(comp. with equation [P §2.6, (74)]), where A(t) and B(t) are functions with 
locally bounded variations, which are right-continuous (for £ > 0), admit left 
limits (for £ > 0) and are such that A(0) = B(O) = 0, AA(t) > —1, where 
AA(t) = A(t) — A(t—), t > 0, AA(O) = 0). 

Prove that, in the class of all locally bounded functions, the above equation 
admits a unique solution & (A, B), which, for any t > 0, is given by the formula: 


1 
6 (A, B) = & (A) | z 180): 


Problem 2.6.51. Let V(t) be any function with locally bounded variation, which is 
right-continuous (for t > 0), admits left limits (for t > 0), and satisfies the relation 


V(t) <K+ f V(s—) dA(s), 
0 


with some constant K > 0 and some non-decreasing and right continuous function 
A(t), which admits left limits and has the property A(0) = 0. 
Prove that 


VE) < K&M), t20; 


in particular, if A(t) = h a(s) ds, a(s) > 0, then the function V(t) satisfies the 
Gronwall—Bellman inequality: 


vO < Kap] f oas}, t>0. 
0 
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Problem 2.6.52. The derivation of Hölder’s inequality [P §2.6, (29)] uses the 
inequality 
P bd 
abe L, 
P q 
in whicha > 0, b > 0 and p > 1 and q > 1 are such that E + ; = |. Prove that 


the above inequality is a special case (with h(x) = x?~') of Young’s inequality: 
ab < H(a)+H(b), a>0, b>0, 
where 
Ha) = [nord Hoy = f Tod, 


and h = h(y), y € R+, is some continuous and strictly increasing function with 
h(O) = 0, limy+o.h(y) = œ, while h = h(y), y € R+, is the inverse of the 
function h = h(y), i.e., 


h(y) = inf{t h(t) > y}. 
Note that since h = h(y) is continuous and strictly increasing then one has hO) = 


h`! (y). 


Problem 2.6.53. Let X be any random variable. Prove that the following implica- 
tions are in force for any a > 0: 


[0.0] 
E|X|*<00 $ J ` n'P{|X]| > n} < o. 


n=1 


Problem 2.6.54. Let £ be any non-negative random variable. Prove that for any 


r > 1 one has 
F E(E Ax”) PEN r Eg!" 
0 xr r-1 


In particular, 
œ E 2 = 
1 ae dx = 2EVE. 
0 XxX 


Problem 2.6.55. Let £ be any random variable with EE > 0, 0 < EE? < oo and let 
e € [0, 1]. Verify the following “inverse” of the Chebyshev inequality 
2(E§)? 

EE? ` 


P{E > cEg} > (1 — e) 


Problem 2.6.56. Let (2, F) be any measurable space and define the set function 
u = u(B), B € F, so that 


|B|, if B is finite, 


B)= 
HB) if B is not finite, 
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where |B| denotes the cardinality of the set B. Prove that the set function u defined 
above is a measure (in the sense of [P §2.1, Definition 6]). This measure is known 
as counting measure. It is o-finite if and only if the set §2 is at most countable. 


Problem 2.6.57. (On the Radon—Nikodym Theorem I.) Let 4, p and v be o- 
finite measures defined on the measurable space (2,.F) and suppose that the 
Radon—Nikodym derivatives > and ag exist. Then show that the Radon—Nikodym 


derivative a also exists and 
dv dv du 


ad = du ah (A-a.e.). 


Problem 2.6.58. (On the Radon—Nikodym Theorem II.) Consider the measure 
space (92, F) = ([0, 1], Z([0, 1])), let A be the Lebesgue measure and let u be 
any counting measure (as in Problem 2.6.56) on F. Prove that y < A, but, at 
the same time, the Radon—Nikodym theorem, which guarantees the existence of the 
density am is not valid. 


Problem 2.6.59. Let A and u be any two o-finite measures on (2, .F) and let 
f= ah, Prove that if u{w : f = 0} = 0, then the density ag exists and can 
be represented by the function 


F on the set { f 4 0}, 
p = 
c onthe set {f = 0}, 


where c is some arbitrary constant. 


Problem 2.6.60. Prove that the following function on the interval [0, co) 


1, x=0, 
F(x) = | sin x 
sar, x>O0, 
is Riemann-integrable (in fact, (R) i f(x)dx = 4), but is not Lebesgue 


integrable. 


Problem 2.6.61. Give an example of a function f = f(x) defined on [0, 1], which 
is bounded and Lebesgue integrable, and yet one cannot find a Riemann integrable 
function g = g(x) which coincides with f = f(x) Lebesgue-almost everywhere 
in [0, 1]. 


Problem 2.6.62. Give an example of a bounded Borel function f = f(x,y) 


defined on R?, which is not Lebesgue-integrable on (R*,.A(R7)), but is such that, 
for every y € R and every x € R, one has, respectively, 


| ID= and / tenion 
R R 


where both integrals are understood to exist in the sense of Lebesgue. 
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Problem 2.6.63. (On Fubini’s Theorem I.) Let A = A(dx) denote the Lebesgue 
measure on [0,1] and let u = u(dy) be any counting measure on [0, 1]. Let D 
denote the diagonal of the unit square [0, 1]?. Prove that 


[Lf Ip (x,y) Ads) |n(dy) = 0 
[0,1] [0,1] 


and 


f |f pa yy wld) alas) = 1 
(0.1) LJ 0,1] 


The above relations show that the property [P §2.6, (49)], in the conclusion of 
the Fubini theorem ([P §2.6, Theorem 3]), cannot hold without the finiteness 
assumption for the measure. 


Problem 2.6.64. (On Fubini’s Theorem II.) Prove that Fubini’s theorem remains 
valid even if the requirement for the two participating measures to be finite is 
replaced with the requirement that these measures are o-finite. Prove that, in 
general, the assumption for o-finiteness of the participating measures cannot be 
relaxed further (see Problem 2.6.63). 


Problem 2.6.65. (Part a) in [P §2.6, Theorem 10].) Give an example of a 
bounded non-Borel function which is Riemann integrable (a reformulation of 
Problem 2.3.29). 


Problem 2.6.66. Let f = f(x) be any Borel-measurable function defined on the 
measurable structure (R”, A(R”)), which is endowed with the Lebesgue measure 
A = X(dx). Assuming that fp, | f(x)|A(dx) < oo, prove that: 


lim f | f(x +h) — f(x)|A(dx) = 0. 
h—>0 J Ra 


Hint. Use part (b) in Problem 2.6.40. 


Problem 2.6.67. For any finite number of independent and integrable random 
variables £, ... , En one has 


E][& =[]&& 
k=1 


k=1 


(see [P §2.6, Theorem 6]). Prove that if £, &,... is any sequence of independent 
and integrable random variables, then, in general, one has 


E | [& # | ]E&. 
k=1 k=1 
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Problem 2.6.68. Suppose that the random variable € is such that EE < 0 and 
Ee?! = 1 for some 0 Æ 0. Prove that this is only possible if 0 > 0. 


Problem 2.6.69. Let h = h(t,x) be any function defined on the set [a,b] x R, 
where a,b € Randa < b. 

(a) Suppose that 

1. For any fixed x° € R the function h(t, x°), t € [a, b], is continuous; 

2. For any fixed t° € [a,b] the function h(t°, x), x € R, is A(R)-(i.e., Borel)- 
measurable, 


and prove that when the above conditions are satisfied one can claim that the 
function h = h(t, x), t € [a,b], x € R, is A({a, b]) x @(R)-measurable. 

(b) Assume that € is a random variable, defined on some probability space, and 
that when conditions | and 2 above are satisfied, together with the condition 


3. The family of random variables {h(t, €),¢ € [a, b]} is uniformly integrable. 
Show that: 
(i) The expected value Eh(t, €) is a continuous function of the variable t € 
la, b]. 
(i) If H(t,x) = fi h(s, x) ds, then the derivative LEH (t, €) exists for all 
t € (a,b) and equals Eh(t, £), i.e., 


GE f neg) ds = Eh(t, $). 


Problem 2.6.70. (On [ P §2.6, Lemma 2].) (a) Let E be any random variable with 
E|&| < oo. Prove that for any £ > 0 one can find a ô > 0, so that P(A) < 6, A € F, 
implies E(|E| Z4) < £. Conclude that, given any random variable € with E|£| < co 
and any € > 0, one can find a constant K = K(e) so that 


EJEN E] > K) = E JEJE] > K) < e. 


(b) Prove that if {&,,, > 1} is a uniformly integrable family of random variables, 


then the family i Yet S 1} also must be uniformly integrable. 


Problem 2.6.71. Prove that Jensen’s inequality [ P §2.6, (25)] remains valid even 
when the function g = g(x), assumed to be convex, is defined not on the entire real 
line R, but only on some open set G C R, and the random variable & is such that 
P{é € G} = 1 and E|é| < ov. Prove that a function g = g(x), which is defined 
on an open set G and is convex, must be continuous. Prove that any such function 
admits the representation: 


g(x) = sup, (anx + bn), x €G, 


where a,, and b, are some appropriate constants. 
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Problem 2.6.72. Prove that for any a,b € R and any r > 0 one has 
la + b\” <c,(lal’ + |b|"), 


where c, = 1 whenr < 1 and c, = 2’~! when r > 1. The above relation is known 
as the c,-inequality. 


Problem 2.6.73. Assuming that £ and 7 are two non-negative random variables 
with the property 


P{E > x} < x E[nI(E > x)], for every x > 0, 
prove that 


p 
EE? < (=) En’, for every p > 1. 
p= 


Hint. Consider first the case where & is bounded, i.e., replace E by & = E Ac, 
c > 0, in which case, according to [ P §2.6, (69)], one must have 


EE? = pf x? | PIE > x} dx. 


Then prove the required property for EE? and pass to the limit as c t 00. 


Problem 2.6.74. Prove the following analog of the integration-by-substitution rule 
(see Problem 2.6.15 and [ P §2.6, Theorem 7], regarding the change of variables in 


the Lebesgue integral). 

Let J be any open subset of R” and let y = g(x) be any function which is defined 
on J and takes values in R” (if x = (x1,...,X,) € J, then y = (yj,..., Yn) with 
Vi = Oi(X1,.--,Xn), i = 1,...,n). Suppose that all derivatives ae are well defined 


J 
and continuous and that | Jọ (x)| > 0, x € J, where J,(x) stands for the determinant 
of the Jacobian of the function 9g, i.e., 
Og; 
Ox; Í 


Jo(x) = det l<i,j<n 


As a consequence of the above assumptions, the set g(/) C R” is open, the function 
p admits an inverse h = g~!, and the Jacobian J;,(y) exists and is continuous on 


g(Z), with |O )| > 0, y € (2). 
Prove that for every non-negative or integrable function g = g(x), x € I, 
one has 


[eax f AONO ay 
I (I) 
which can be written also as 


[coax =f ETOO dy, 
I gl) 


where all integrals are understood as Lebesgue integrals in R”. 
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Problem 2.6.75. Let F = F(x) be a cummulative distribution function with 
F(0) = 0 and F(1) = 1, which is Lipschitz continuous, in that | F(x) — F(y)| < 
L|x — y|, x, y € [0, 1]. Let m be the measure on [0, 1] given by m(B) = fẹ dF (x), 
B € &((0, 1]), and let A be the Lebesgue measure on (0, 1]. 

Prove that m < A and 

am <L (A-ae.). 

Problem 2.6.76. Suppose that g = g(x) is some function which is defined on the 
interval [a,b] C R and is convex, i.e., g((1 —A)x +Ay) < (1—A)g(x) +Ag(y) for 
any x, y € [a,b] and any 0 < A < 1). Prove that this function must be continuous 
on the interval (a, b) and conclude that it is a Borel function. 

Hint. Argue that convexity implies that for every choice of x,y,z € [a,b], 
x <y <z, one has 


&(y) — 8(x) _ 8R) -80) 
y=% ey 
By using the above relation conclude that g = g(x) must be continuous on the 
interval (a, b). 


Problem 2.6.77. Consider the generating function G(s) = )°72.) prs“, associated 
with the discrete random variable X with P{X = k} = px, k = 0,1,2,..., 
ype Pk = 1 (see Problem 2.6.28), and let 


dk = P{X > k}, rp = P{X <k}, k=0,1,2,.... 


Prove that the generating functions for the sequences q = (gx)x>0 andr = (rx )K>0 
are given, respectively, by 


G(s) 


G,(s) = ; |s| <1, G, (s) = Tas |s| <1. 


1 — G(s) 
l—-s 
Problem 2.6.78. (On the “probability for ruin” —see [ P §1.9].) Let Sọ = x and let 
Sa = x+ +... +é, n => 1, where (&)x>1 is some sequence of independent and 
identically distributed random variables with P{& = 1} = p, P{f& = —1} = q, 
p+q = 1, and x is some integer number with 0 < x < A. Consider the stopping 
time for the random walk (or for the “game” between two players—see [P §1.9]), 

which is given by 


t = inf{n > 0: S, =Oor S, = A}. 


Consider also the probability px(n) = P{t = n, S, = 0} for the stopping to occur 
with “ruin” (i.e., {S,, = 0}) in the n""-period. 

Prove that the generating function G,(s) = °°.) px(n)s” satisfies the follow- 
ing recursive relation: 


Gx (s) = ps Gy+i(s) F qs Gx-1(s), 
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with Go(s) = 1 and G4(s) = 0. By using this relation prove the formula 


_ (4) OM) - BMG) 
600) = (FC) QO 


where 


à(s) = mae + V1—4pqs?} and A2(s) = stl — V1= 4pqs? 4pqs?}. 


Problem 2.6.79. Consider a lottery with tickets numbered 000000, ..., 999999 
and suppose that one of these tickets is chosen at random. Find the probability, P21, 
that the sum of the six digits on this ticket equals 21. 

Hint. Use the methodology based on generating functions, developed 
in Sect. A.3 (pp. 372-373). The answer is P2; = 0.04. 


Problem 2.6.80. Suppose that £ is a random variable with unimodal probability 
density f(x) that has maximum at the point xo (referred to as the mode, or the 
peak, of the respective distribution), so that f(x) is non-decreasing for x < xo and 
is non-increasing for x > Xo. 

Prove Gauss inequality: 


4 
P{|& — xo| > eE|& — x0|7} < am for every £ > 0. 
e 


Hint. If the function g (y) does not increase for y > 0, then 


2 = 4 a 2 
A| CODEF y’g(y)dy, foranye> 0. 


One can conclude from the above inequality that, given any £ > 0 and d? = Ejé — 
xo|?, one has 
4E Xo)/d]’ 4 
Piz= mi è ed] = [E = xo)/d] _ 


E2 982` 


Problem 2.6.81. Suppose that the random variables £1, . . . , En are independent and 
identically distributed, with P{é; > 0} = 1 and DIn&; = o°. Given any ¢ > 0, 
prove that 

2 


PLE, ...& < (Eln€)"e"*} > 1— os 


ne 
Hint. Use Chebyshev’s inequality 
DY, 


262 


P{|Y, — EY,| < ne} > 1 — 


with Y, = ) ya, Ing. 
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Problem 2.6.82. Let P and P be probability measures on (2, F), such that P is 
absolutely continuous with respect to P (P « P), with density that is bounded by 
some constant c > 1: 


— <c (P-as.). 
dP ~ ( ) 
Prove that there is a number a € (0, 1] and a probability measure Q for which one 


can write _ 
P=aP+(1—a)Q. 


Hint. Choose an arbitrary constant C > c and set a = 1/C and 


Q(A) = =f Grae 


Problem 2.6.83. Let E and 7 be any two independent random variables with 
E£ = 0. Prove that EJE — n| > Ely]. 


Problem 2.6.84. Let &,&,... be any sequence of independent and identically 
distributed random variables taking values in R = (—oo,0o) and set So = 0, 
Sy = & +... + én. The so-called “ladder indexes” (a.k.a. “ladder moments”) are 
defined by the following recursive rule: 


To = 0, Tk = inf{n > Tk—1 |: Sy — ST, > O}, k>1, 
where, as usual, we set inf Ø = oo. It is clear that 


P{7T, = n} = P{S; <0,..., Sn-1 < 0, Sn > 0}, foralln > 1. 


Prove that the generating function G(s) = °°, fns” for the random variable 7; 
(with fa = P{T, = n}) is given by the formula 


G(s) = exp | -> pis, > ol, |s| <1. 


n=1 


Problem 2.6.85. With the notation adopted in the previous problem, setting 


Al “l 
A=) > ~P{S, <0}, B= 5 -P{S, > 0}, 


n=1 n=1 
prove that 


1, if B = 00, 
P{T, < oo} = Pare 


l-e’, ifB<oa@. 
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ET, = eĉ, ifA<o, 
oo, if A =œ. 


Problem 2.6.86. Just as in Problem 2.6.84, let &,&,... be any sequence of 
independent and identically distributed random variables with Eé; > 0, set Sp = 0, 
Sa = E +-+ + Én, and let 


t = inf{n >: S, > 0}. 


Prove that Et < oo. 


Problem 2.6.87. Let everything be as in Problem 2.6.84, let Zy denote the breadth 
(span) of the sequence So, S1, ..., Sv, i.e., the total number of different values that 
can be found in that sequence, and let 


o(0) = inf{n > 0: S, = 0} 


be the moment of the first return to 0. 
Prove that for N — oo one has 


e% > P{0 (0) = oot. 


Note that P{o(0) = oo} is the probability for no-return to 0—comp. this result with 
Problem 1.9.7. 


Remark. According to Problem 8.8.16, in the case of a simple random walk with 
P{& = 1} = p and P{& = —1} = q, one must have 


KR 
E > |p—q| as N > co 
N 
in other words, 
—q, ifp > 1/2, 
By p=q;, ifp>l1/ 


ae 0, if p = 1/2, 
q—p, ifp<1/2. 


Problem 2.6.88. Let &, &,... be any sequence of independent random variables 
that are uniformly distributed in the interval (0, 1) and let 


v =min{n > 2: En > En—1}5 u(x) = min{n 2i £i meee En > Xx}, 
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where 0 < x < 1. Prove that: 
(a) P{u(x) > n} = x"/n!,n > 1; 
(b) Law(v) = Law(u(1)); 
(c) Ev = Ev(1) =e. 
Hint. (a) Consider proof by induction. 
(b) One must show that P{v > n} = P{& > & >--- > éa} = 1/n!. 


2.7 Conditional Probabilities and Conditional Expectations 
with Respect to o-algebras 


Problem 2.7.1. Suppose that £ and 7 are two independent and identically dis- 
tributed random variables, such that E£ exists. Prove that 


EEIE += EOE +m ==" (ey, 


Hint. Observe that for any A € o (E + n) one has E€/4 = EnJy. 


Problem 2.7.2. Suppose that £1, &,... are independent and identically distributed 
random variables, such that E|&;| < oo. Prove that 


Sn 
E(& | Sn, Snes ts) = rs (a.e.), 


where S, = & + +--+ Én. 
Hint. Use the fact that Eé; 14 = E&; I4 for any A € o (Sn, Sn41,..-)- 


Problem 2.7.3. Suppose that the random elements (X, Y) are such that there is 
a regular distribution of the form P(B) = P(Y €e B|X = x). Prove that if 
E|g(X, Y)| < oo, for some appropriate function g, then P,.-a.e. one has 


Ele X.Y) |X =a] = f we. Udy). 


Hint. By using the definition of a regular conditional distribution and the notion 
of “z-A-system” (see [P §2.2]), prove that for any function g(x, y) of the form 
ya Ail4;, where A; € A(R’), the map 


xs f ey) P.(dy) 
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must be 4(R)/A(R)-measurable and 


Eg(x,Y)Iz = f TESS Ps(dy)) od»), for every B € A(R), 


where Q stands for the distribution of the random variable X. Prove that these 
properties hold for all bounded A(R*)-measurable functions, and then conclude 
that they must hold for all g(x, y) with E]g(X, Y)| < oo. 


Problem 2.7.4. Let £ be a random variable with (cummulative) distribution func- 
tion F(x). Prove that 
J ’ dF, p(x) 


a 


EGl¢<§50)= O- Fla)’ 


where we suppose that F; (b) — F(a) > 0. 
Hint. Use the fact that, by the very definition of conditional expectation, if 
E[I(a < & < b)| > 0 one can write 


ElI (a < £ < b)] 


PEES EA E a eo 


Problem 2.7.5. Let g = g(x) be any function defined on R which is convex and is 
such that E| g(€)| < oo. Prove that Jensen’s inequality holds P-a. e. for conditional 
expected values, namely, 


g(E(E|Y)) < EEIZ) (ae.). 


Hint. First use the fact that for the regular conditional distribution Q(x; B), 
associated with the random variable £, relative to the o-algebra ¥Y one can write 


E(e@) 1M) = | g0) Qids) 


(see [P §2.7, Theorem 3]), and then use Jensen’s inequality for standard expected 
values. 


Problem 2.7.6. Prove that the random variable € and the o-algebra Y are indepen- 
dent (i.e., the random variables £ and Jg(@) are independent for every choice of 
B € Y) if and only if E(g(€) | Y%) = Eg(é) for any Borel function g(x), such that 
E|g(&)| < 00. 

Hint. If A € Y and B € &(R), then, due to the independence between & and Y, 
we have P(A N {g(&) € B}) = P(A)P{g(&) € B}, and, therefore, E(g (E£) |2) = 
Eg(&). Conversely, when the last relation holds, setting g(€) = I(E € B), one finds 
that 

P(AN {E € B}) = P(A)P{E € B}. 
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The independence between € and ¥ follows from the fact that in the last relation 
AE% and B € A(R) are chosen arbitrarily. 


Problem 2.7.7. Suppose that € is a non-negative random variable and let Y be some 
o-algebra, Y C F. Prove that E(E|Y) < oo (a.e.) if and only if the measure Q, 
defined on the sets A € Y by Q(A) = f, € dP, is o-finite. 

Hint. To prove the sufficiency part, set Ag = {E(E|Y) = œ}, A, = 
{E(E |F) < n}, and check that Q(A,,) < n, which implies that Q is o-finite. 

To prove the necessity part, one must conclude from the existence of the sets 
Aj, Ao,... € Y, with J, Ai = @ and Q(A;) < œ,i = 1,...,n, that E( | 2) < 
co (P-a.s.). 


Problem 2.7.8. Prove that the conditional probability P(A | B) can be claimed to 
be “continuous”, in the sense that if lim, A, = A and lim, B, = B, with P(B,) > 0 
and P(B) > 0, then lim, P(A, | Bn) = P(A | B). 


Problem 2.7.9. Let 2 = (0,1), F = AO, 1)), and let P denote the Lebesgue 
measure. Suppose that X(w) and Y(w), w € 2, are two independent random 
variables that are uniformly distributed on (0, 1). Consider a third random variable, 
Z(w) = |X(w) — Y(q)|, which represents the distance between the random points 
X(q@) and Y (œw). Prove that the distribution of Z (w) admits density fz(z) = 2(1—z), 
0 <z < 1, and conclude that EZ = 1/3. 


Problem 2.7.10. Two points, A; and Az, are chosen at random on the circle 
{(x, y): x? + y? < R?}; more specifically, A; and Az are sampled independently 
and in such a way that (in polar coordinates, A; = (pi, 0;), i = 1,2) 

dr d9 

P(o: € dr, 0; € d0) = EZ i =1,2. 

z R2 
Prove that the random variable p, which represents the distance between A; and Ao, 
admits density /,(r), given by 


for) = ml? arceos( =-=) — Z 1— (sz) | 


where 0 <r < 2R. 


Problem 2.7.11. The point P = (x, y) is sampled randomly (clarify!) from the 
unit square, i.e., from the square with vertices (0,0), (0,1), (1, 1), (1, 0). Find the 
probability that the point P is closer to the point (1, 1), than to the point (1/2, 1/2). 


Problem 2.7.12. (The “random meeting” problem.) Person A and person B have 
agreed to meet between 7:00 p.m. and 8:00 p.m. at a particular location. They have 
both forgotten the exact meeting time and choose their respective arrival times 
randomly and independently from each other between 7:00 p.m. and 8:00 p.m., 
according to the uniform distribution on the interval [7:00, 8:00]. They both have 
patience to wait no longer than 10 min. Prove that the probability that A and B will 
actually meet equals 11/36. 
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Problem 2.7.13. Let X1, X2,... be a sequence of independent random variables 
and let S, = ys X;. Prove that Sı and $3 are conditionally independent relative 
to the o-algebra o (S2), generated by the random variable S>. 


Problem 2.7.14. We say that the o-algebras i and % are conditionally 
independent relative to the o-algebra % if 


P(A; A2 |G) = P(A, |G) P(A2 | 4), for all Aj € G i = 1,2. 


Prove that the conditional independence of Y and Y% from Y is equivalent to 
the claim that any of the following conditions holds P-a. e.: 

(a) P(A, | olh U G3)) = P(A; | Gs), for all Aj € Gi; 

(b) P(B |o (h U B)) = P(B |2) for all B € A, where Y; is any z-system 
of subsets of Y, such that 9, = o (2); 

(c) P(B, By |o (f U@)) = P(B |2) P(B2|%) for all sets Bı € Z; and 
By € Pa, where P; and Y are any two z-systems of subsets of, respectively, Y 
and %, chosen so that Y = o (21) and % = o (P2); 

(d) E(X |o (h U®@)) = E(X |2) for any o(% U Y)-measurable random 
variable X, for which the expectation EX exists (see Definition 2 in § 6). 


Problem 2.7.15. Prove the following generalization of Fatou’s lemma for condi- 
tional expectations (comp. with (d) from [ P §2.7, Theorem 2]). 

Let (2, F,P) be any probability space and let (&,),>+1; be any sequence of 
random variables, chosen so that the expectations E&,, n > 1, are well defined 
and the limit E lim £, exists (and may equal --oo—see [P §2.6, Definition 2]). 

Suppose that Y is some o-algebra of events inside ¥ chosen so that 


sup E(E; I(E, = a)|Y)>0 asa>co (P-a.e.). 
n>1 


Then 
E(limé, |Y) < limE(&,|Y) (P-a.e.). 


Problem 2.7.16. Just as in the previous problem, let (&„)n>1 be any sequence of 
random variables, chosen so that all expectations E&,, n > 1, exist, and suppose 
that Y is some o-algebra of events inside ¥ chosen so that 


sup lim E(\fil/(l&l = K)|%)=0 (P-a.e.). © 


Prove that if €&, —> £ (P-a. e.) and the expected value E£ exists, then 
E(é&, |Y) > EE |F) (P-a.e.). (xx) 
Problem 2.7.17. Let everything be as in the previous problem, but suppose that (*) 


is replaced with the condition sup, E(|& |% |9) < oo (P-a.e.), for some œ > 1. 
Prove that the convergence in (x) still holds. 
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Problem 2.7.18. Prove that if &,, 2; E, for some p > 1, then 


E(é, |Y) —> EEIZ), 


for any sub-o-algebraY C F 


Problem 2.7.19. Let X and Y be any two random variables with EX? < oo and 
E|Y| < oo. 

(a) Setting D(X | Y) = E[(X — E(X | Y))? | Y], prove that DX = ED(X | Y) + 
DE(X | Y) (see Problem 1.8.2). 

(b) Prove that cov(X, Y) = cov(X, E(Y | X)). 


Problem 2.7.20. Is the sufficient statistics T (w) = s(X,(@)) +--+ + s(X,(@)) in 
[P §2.7, Example 5] minimal? 


Problem 2.7.21. Prove the factorization identity [ P §2.7, (57)]. 


Problem 2.7.22. In the context of [ P §2.7, Example 2], prove that Eg(X; |T) = 


"ET, where X; (w) = x; form = (X1, .-., Xn), i = 1,...50. 


Problem 2.7.23. Let (2, #,P) be a probability space, let A, B and C),...,C, be 
events chosen from the o-algebra F, and suppose that for any į = 1,...,” one has 


P(C;)>0, P(A|C;) = P(B|C), 


and | J?_, Ci = 2. Can one claim that P(A) > P(B)? 


Problem 2.7.24. Let X and Y be any two random variables with E|X| < oo, 
E|Y | < ov, such that E(X | Y) > Y and E(Y | X) > X (P-a.e.). Prove that X = Y 
(P-a. e.). 

Hint. Prove that it is enough to consider the case where the inequalities > are 
replaced by equalities. 

Method I. Consider the function g(u) = arctanu. Then (X — Y)(g(X) — 
g(Y)) > 0 (P-a.e.), and, at the same time, one can show that E[(X — Y)(g(X) — 
g(Y))] = 0, from where one can conclude that X = Y (P-a.e.). 


xt41 _ yt4+1 _ . = 
Method II. Argue that E 57> = 1, E p7 = 1. Then conclude setting Z = 


xvi , show that E(V Z — zz) = 1. Then conclude that P{X+ = YT} = 1. One 
can show, analogously, that P{X~ = Y~} = 1. 


Method III. Prove that if P{X < Y} > 0 then there is a constant c with the 
property P{X < c < Y} > 0. Consider the sets A = {X < c} and B = {Y > c} 
and argue that 


[x -narafa-nars f Y- xyaP+ [ (Y —X)dP <0, 


which contradicts to fo (Y — X) dP = 0. 
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Remark. Itis not possible to find two random variables X and Y with E|X| < oo 
and E|Y | < ov, such that the strict inequalities E(X | Y) > Y and E(Y | X) > X 
both hold with probability 1. Indeed, assuming that such random variables exist 
would lead to a contradiction: EX = EE(X | Y) > EY = EE(Y |X) > EX. 


Problem 2.7.25. Assuming that X is some geometrically distributed random vari- 
able with 


P{X =k} = pq, k=1,2,..., O0<p<1, q=1-p, (x) 
prove that 
P{X >n+m|X >n} =P{X >m}, form,ne {1,2,...}. (*x*) 


What is the interpretation of the above property? 

In addition, prove the converse statement: if a discrete random variable takes 
values in the set {1,2,...} and has the property (**), then it must also have the 
property (+). 

(Comp. with Problem 2.7.45.) 


Problem 2.7.26. Prove that the random vectors (X, Y) and (x, Y) share the same 
distribution ((X, Y) 4 (X,Y)), if and only if P(X € A|Y} = P{¥ € A|Y} 
(P-a.e.), for any event A. 


Problem 2.7.27. Let X and Y be any two independent Poisson random variables 
with parameters, respectively, A > 0 and u > 0. Prove that the conditional 
distribution Law(X | X + Y) is binomial, i.e., 


f À k u n—k 
P(X =k|X +Y =n) =Ck* (| ——] | —— , f O<k<n. 
C oa we (3) (45) ita 


Problem 2.7.28. Suppose that £ is a random variable that is uniformly distributed 
in the interval [—a,b], with a > 0, b > O, and, setting Y% = o(|&|) and® = 
o (sign €), calculate the conditional probabilities P(A |Y,), i = 1,2, for the events 
A = {E > O} and A = {E < a}, where œ € [—a, b]. 


Problem 2.7.29. Prove by way of example that the relation E(E + n|Y) = 
E(é|Y) + E(n|Y) (P-a.e.) does not always hold (comp. with property D* 
in [P §2.7, 4)]). 

Hint. It may happen that the expected value E(£ + n|) is well defined and 
equals zero (P-a. e.), while, at the same time, the sum E(&|4%) + E(n |2) is not 
defined. 


Problem 2.7.30. In the definition of the conditional probability P(B | 4) (œ) (of the 
event B € ¥ relative to the o-algebra f C #—see Definition 2 in [P §2.7, 2]), 
the map P(-|4)(w) is not required to be a measure on (Q, F) for P-a.e. w € Q. 
Prove that such a requirement cannot be imposed; namely, construct an example 
where the set of all œ € 92 for which P(-|¥Y)(@) fails to be a measure is not P- 
negligible. 
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Problem 2.7.31. Give an example of two independent random variables, X and Y, 
and a o-algebra Y, chosen so that for some choice of the events A and B one has 


P(X € A, Y € B|Y)(w) £ P(X € A|¥Y)(w) PY € BI Y)(), 


for all œ inside some set of positive P-measure. In other words, show that 
independence does not imply conditional independence. 


Problem 2.7.32. If the family of random variables {&,},>1 is uniformly integrable 
and é, — & (P-a.e.), then E&, —> EE (see b) in Theorem 4 from [P §2.6]). At 
the same time the P-a. e.convergence of the conditional expectations E(&, |J) > 
E(é|¥) can be established (see a) in Theorem 2 from [P §2.6]) under the 
assumption that |a| < n, En < o0o,n > 1, and &, — & (P-a.e.). 

Give an example showing that if the condition “|é | < n, En < œ, n > 1” 
is replaced by the condition “the family {&,},>1 is uniformly integrable,’ then 
the convergence E(&, |F) — E(&|@%) (P-a.e.) may not hold. Analogous claim 
can be made about condition a) in [P §2.6, Theorem 4] (i.e., Fatou’s Lemma for 
uniformly integrable random variables) in the case of conditional expected values 
(see, however, Problems 2.7.15—2.7.17 above). 


Problem 2.7.33. Suppose that (2,-%,P) is identified with the probability space 
([0, 1],.%,A), where A is the Lebesgue measure and .¥ is the Borel o-algebra on 
[0, 1]. Give an example of a sub-o-algebra ¥ C F, chosen so that the Dirichlet 
function 


ias 1, if @ is irrational, 


0, if% is rational, 


is a version of the conditional expectation E(1 | ¥)(q). In particular, the conditional 
expectation E(é|¥)(@) of some “smooth” function € = &(w) (for example, 
&(@) = 1) may have a version, which, as a function of œw, may be “extremely non- 
smooth”. 


Problem 2.7.34. If, given a random variable £, the expected value E£ exists, then, 
by property G* in [P §2.7, 4], one can write E(E(&|7)) = E£ for any random 
variable 7. Give an example of two random variables £ and n, for which E(E(é | n)) 
is well defined, while E£ is not. 


Problem 2.7.35. Consider the sample space 2 = {0,1,2,...} and suppose that 
this space is endowed with a family of Poisson distribution laws Pg{k} = a : 
k € Q, parameterized by 6 > 0. Prove that it is not possible to construct an 
unbiased estimator T = 7(qw) for the parameter i, i.e., one cannot construct a 
random variable T = T(w), w € Q with the property Eọ|T| < oo, for all 8 > 0, 


and EgT = 4, for all 0 > 0. 


Problem 2.7.36. Consider the statistical model (2, F, P), where Y = {P} is 
some dominated family of probability measures P. Prove that if Y C ¥ is some 
sufficient o-algebra then any o-algebra G with Z (e G C F is also sufficient. 
(Burkholder’s example [18] shows that if the family Y is not dominated, then, in 
general, this claim cannot be made.) 
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Problem 2.7.37. Prove that each of the following structures represents a Borel 
space: 

(a) (R", A(R"); 

(b) (R°, B(R™)); 

(c) any complete separable metric space, i.e., any Polish space. 


Problem 2.7.38. Assuming that (E, £) is a Borel space, prove that there is a 
countably-generated algebra æ% , for which one can claim that o (%2) = &. 


Problem 2.7.39. (On the property K*.) Let n be any -measurable random 
variable, £ be any F -measurable random variable and suppose that E|n|4 < oo 
and EJE|? < oo, where p > 1 and i + ; = 1. Prove that E(En| Z) = nE(é | 2). 


Problem 2.7.40. Given some symmetrically distributed random variable X (i.e., 
Law(X) = Law(—X)), calculate the conditional distribution 


P(X <x|o(X))@), xeR, 


in terms of the (cumulative) distribution function F(x), where o (|X |) stands for the 
o-algebra generated by |X|. 


Problem 2.7.41. Assuming that A and B are two events with P(A) = «œ and 
P(B) = 1 — p, where 0 < £ < 1 and f <a, prove that 


a-g 
rag $PAIB). 


Problem 2.7.42. Let p; denote the probability that a given family has k children, 
and suppose that 


Po= pi =a (<1/2) and pe =(1—2a)2--), k>2. 


It is assumed that in any given birth the probability for a boy and the probability for 
a girl both equal to 1/2. 

Assuming that a particular family already has two boys, what is the probability 
that: 

(a) The family has only two children; 

(b) The family also has two girls. 

Hint. The solution is a straight-forward application of Bayes’ formula. The two 
probabilities are respectively 27/64 and 81/512. 


Problem 2.7.43. Suppose that X is some symmetrically distributed random vari- 
able (ie. X ¿ -x) and the function g = g(x), x € R, is chosen so that 
E|g(X)| < oo. Prove that 


1 
Ele) IXI] = 5X) +X] (P-a.e.). 
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Problem 2.7.44. Assuming that X is some non-negative random variable, calculate 
the conditional probabilities 


P(X <x|[X]) and P(X <x|/X)), 


where |X | stands for the largest integer which does not exceed X and [X] stands 
for the smallest integer which is greater than or equal to X. 


Problem 2.7.45. Assuming that X is some exponentially distributed random vari- 
able with parameter A > 0, i.e., P{X > x} = e* x > 0, prove that for any two 
non-negative real numbers, x and y, one has 


P(X >x+y|X >x)=P{X >y} 


The last relation is often interpreted as the “lack of memory” in the values of X. 
Prove that if some extended (i.e., with values in [0, co]) random variable X lacks 
memory, i.e., has the above property, then only one of the following three cases is 
possible: P{X = 0} = 1, or P{X = co} = 1, or X is exponentially distributed 
with some parameter 0 < A < oo. 

Hint. Setting f(x) = P{X > x}, the lack of memory property can be expressed 
as f(x + y) = f(x) f(y). Consequently, the proof comes down to showing that, in 
the class of all right-continuous functions f(-) with values in the interval [0, 1], the 
solution to the equation f(x + y) = f(x) f(y) can be either of the form f(x) = 
0, or of the form f(x) = 1, or of the form f(x) = e~**, for some parameter 
0<A<oo. 


Problem 2.7.46. Assuming that the random variables X and Y have finite second 
moments, prove that: 

(a) cov(X, Y) = cov(X, E(Y | X)); 

(b) If E(Y | X) = 1 then DX < DXY. 


2.8 Random Variables II 


Problem 2.8.1. Establish the validity of formulas (9), (10), (24), (27), (28) and 
(34)-(38) in [ P §2.8]. 


Problem 2.8.2. Suppose that &,...,&), n > 2, are independent and identically 
distributed random variables with (cumulative) distribution function F(x) and, if it 


exists, density f(x). Let E = max(&,...,&), E = min(&,...,&) and p = E — £. 
Prove that: E 7 


_ SFO)" = (FO) - F@)", y > x, 


Fee) 
FE VR)", Jai 
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; _ na- DFO) -FOSSO y> x, 
Js, x) = 
g ‘i y Sx; 
j j=} 
F,(x) = nf FO) —F(y-x)I"'fQ)dy, x20, 
0, x <0; 
Lo= n= [Fo -= F(y—x)I" °f(y-x)fO)dy, x >0, 


0, x <0. 


Problem 2.8.3. Assuming that £; and & are two independent Poisson random 
variables with parameters, respectively, A; > 0 and Az > 0, prove that: 

(a) & + & has Poisson distribution with parameter A; + A». 

(b) The distribution of £; — & is given by 


a, \k? 
P{é) —& =k} = ezartia (2) Ik(2VAià2), k =0,+£1,+2,..., 
2 


where 
oo 2r 


wage a 
MAER D ATED 


is the modified Bessel function of the first kind and of order k. 
Hint. One possible proof of Part (b) is based on the series expansion of the 
generating function of the random variable £; — &:—see Sect. A.3. 


Problem 2.8.4. Setting mı = mz = 0 in formula [ P §2.8, (4)], show that 


0102 1- p 


fink) = 


(0327 — 2p0102z + 0°) 


Problem 2.8.5. The maximal correlation coefficient between the random variables 
€ and ņ is defined as the quantity p*(&,ņn) = sup, e(u(&),v(n)), where the 
supremum is taken over all Borel functions u = u(x) and v = v(x), for which the 
correlation coefficient p (u(Ẹ), v(y)) is meaningful. Prove that the random variables 
€ and n are independent if and only if p* (€, 7) = 0—see Problem 2.8.6 below. 

Hint. The necessity part of the statement is obvious. To prove the sufficiency part, 
given two arbitrarily chosen sets A and B, set u(&) = 14(€) and v(n) = Ig (n) and 
show that sup,,,, e(u(&), v(n)) = 0 implies 


Pté € A,n € B}— PLE € A}Pin € B} = pU4(E), Is (n)) = 0. 
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As the sets A and B are arbitrarily chosen, the last relation guarantees that € and n 
are independent. 


Problem 2.8.6. (Continuation of Problem 2.8.5.) Let (2, -F,P) be a probability 
space, let FY; C F, and Fo C F be any two sub-o-algebras of F, and let 
L?(Q,.¥;,P), i = 1,2, be the usual spaces of random variables with finite second 
moment. 

Set 


pP“ (Ai, F2) = sup |o(&1, &)I, 
where p(&1, &2) is the correlation coefficient between & and & and the supremum is 
taken with respect to all pairs of random variables (£1, £2) with & € L?(Q, Fi, P), 
i = 1,2. 
(a) Prove that if Fı = o(X,) and F2 = o(X2), for some random variables X: 
and X2 on (92, ¥,P), then 


p*(a(X1),0(X2)) = |o(X1, X2). 


(b) Let Fy = Vier A (= Uier &) and Fz = Vier Bi (= Uier Bi) , where 
I is some index set. Assuming that all o-algebras o(.%, 4), i € I, are jointly 
independent (0 (<4, 2, ) stands for the o-algebra generated by the sets A; € < and 
B; € Z), prove that 


p* (V hh, Va) = sup p* (2%, Bi). 


ie] ie] 


Problem 2.8.7. Let (2, F ,P) be any probability space and let Fy C F and Fy C 
F be any two sub-o-algebras of .F. Define the following quantities, every one of 
which measures the degree of mixing between F and F>: 


a(F|, Fa) = sup{|P(A N B) = P(A)P(B)| :A E€ F, Be Fy}; 
oF, Fa) = sup{|P(B| A) — P(B)|: A € Fy, B € Fo, P(A) > 0}: 


P(AN B) 
Fi, Fr) = ——_—.— ]|:Ae F, BE Fo, P(A)P(B 0}. 
W(A1, Fa) spl | E€ Fi, B € F2, P(A)P(B) > 
In addition, let 
N M 


1 
BFL, Fa) = sup 5 9 2 P(A; N Bj) — P(4)P(B;)], 


i=l j=1 


where the supermum is taken over all finite partitions {41,..., Ay} and 
{B,,..., Bm}, with A; € Fı and Bj € npl <i <N,1< j < M and 
N>1,M>l1. 
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Verify the following inequalities, in which the quantity p* (F1, 2) is as defined 
in Problem 2.8.6: 


a(F|, Fa) < BIA, Fa) < WAI, Fa) < WA, Fa), 


and 
AFi, Pa) < PFP, Po) < 20 AF, F), 
Fi, Pa) < 20 AF, Fag P, A). 
Problem 2.8.8. Assuming that t1,...,T¢ are independent and identically dis- 


tributed random variables, all having exponential distribution with density 
fQ=Ae™, t>0, 


prove that t; +--+ + Tk is distributed with density 


qk pk-lenAt 
————; £20, 
(k —1)! = 
and 
k-1 ; 
ar At)! 
P race HS At. 
(tq +--+ +t >t) 2 rT 


Problem 2.8.9. Assuming that € ~ ./(0, 07), prove that for every p > 1 one has 


EJE? = C,o?, 
where 
a 2P/2 p+ 
PO 51/2 ( 2 ) 
and I (s) = i e-*x°—! dx is Euler’s Gamma function. In particular, for any 


integer n > 1 one can write (see Problem 2.6.36) 
EE” = (2n —1)!07". 


Problem 2.8.10. Prove that if € and 7 are two independent random variables, such 
that the distribution of € +7 coincides with the distribution of £, then 7 = 0 (P-a. e.). 

Hint. Use the fact that if n1,..., Nn, n = 1, are independent random variables, 
all having the same distribution as 7, then, for any n > 1, the distribution of & + 
nı +--+- + Nn coincides with the distribution of £. 
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If £ and 7 admit moments of all orders, then one can use the relationship between 
the semi-invariants ca and ae k > 1 (see [P §2.12]). 


Problem 2.8.11. Suppose that the random point (X, Y) is distributed uniformly in 
the unit disk {(x, y): x? + y? < 1}, let W = X? + Y?, and set 


U=x aa Vay j_2inW 
W W 


Prove that U and V are independent .¥ (0, 1)-distributed random variables. 


Problem 2.8.12. Suppose that U and V are two independent random variables that 
are uniformly distributed in the interval (0, 1), and set 


X =vV-InVcos(2xU), Y = V—InV sin(2mU). 


Prove that X and Y are independent -¥ (0, 1)-distributed random variables. 


Problem 2.8.13. Consider some positive random variable R, which is distributed 
according to Rayley law, i.e., has density 


2 
r r 
falt) =Z- o #0, 


with some o? > 0, and suppose that the random variable 6 is uniformly distributed 
in the interval (a, œ + 27k), where k € N = {1,2,...} anda € [0, 27). 

Prove that the random variables X = R cos and Y = Rsin@ are independent 
and distributed with law -V (0, 07). 


Problem 2.8.14. Give an example of two Gaussian random variables, € and n, for 
which £ + 7 is not Gaussian. 


Problem 2.8.15. Let X,,..., X, be independent and identically distributed random 
variables with density f = f(x) and let 


&y = max(X1,..., Xn) — min(X],..., Xn) 


denote the “range” of the sample (X1,..., Xn). Prove that the density of the random 
variable Z,, is given by 


A e 1 FO- FO- FO) f—a)dx, x >0, 


where F(y) = f°, f(z) dz. In particular, if X),..., Xn are uniformly distributed 
in the interval [0, 1], then one has 


n(n —1)x"*(1—x), O<x <1, 


0, x<Oorx>1. 


fa, (x) = | 
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Problem 2.8.16. Let F(x) be any (cummulative) distribution function. Prove that 
for any a > 0 the functions 


x+a x+a 
G\(x) = zf F(u)du and G(x) = =f F(u)du 


x=4 
are also (cummulative) distribution functions. 


Problem 2.8.17. Suppose that X is some exponentially distributed random variable 
with parameter A > 0, i.e., X has density fy(x) = Ae~** I(x > 0). 

(a) Find the density of the distribution law of the random variable Y = X l/a, 
a > 0, which is known as the Weibull distribution. 

(b) Find the density of the distribution law of the random variable Y = In X, 
which is known as the double exponential law. 

(c) Prove that the integer part and the fractional part of the random variable 
X, ie., |X] and {X} = X — |X], are independent random variables. Find the 
distribution of | X | and {X}. 


Problem 2.8.18. Let X and Y be any two random variables with joint density of 
the form f(x, y) = g(y x? + y?). 

Find the density of the joint distribution of the random variables p = / X? + Y? 
and 0 = tan”! (Y / X), and prove that p and 6 are independent. 

Setting U = (cosa) X +(sina)Y and V = (—sina)X + (cosa)Y,a € [0,27], 
prove that the joint density U and V coincides with f(x, y). (This property reflects 
the fact that the distribution of the vector (X, Y) is invariant under rotation in R?.) 


Problem 2.8.19. Let X,,..., X, be independent and identically distributed random 
variables with continuous (cummulative) distribution function F = F(x). As this 
assumption implies P{X; = X;} = 0,1 # j (see Problem 2.8.76 below), it follows 
that 


P{X; = X; forsomei £ j} = P| Ux: =x; < > P{X; = Xj} =0. 


i<j i<j 


Consequently, one can claim that with probability 1 the numbers X: (w), .. . , Xn(@) 
can be arranged (and in a unique way) in a strictly increasing sequence. The ele- 
ments of this sequence, which we denote by X” (w), ... , XI” (œ), are well-defined 
random variables that are commonly referred to as order statistics—see also 
[ P §3.13] and Problem 1.12.8. Thus, with probability 1 we have 


XP) <- < XM) 
and 


X(w) = min(X,(o),... ,Xn(@)), ..., XO (©) = max(X1(),... , Xn(o)). 
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In addition, we will suppose that the distribution F = F(x) admits density f = 


f(x). 
Prove that: 
(a) The density of X is given by 


nf (x)CK LF) — Fy. 


n-1 
(b) The joint density, fx, sary Xa) Of ae Dues xX” is given by 


al f(x1)... fn), th xp <+++< Xp, 
in all other cases. 


JO,- Xa) = 


(c) If f(x) = To. (x) G.e., if the random variables X; are distributed uniformly 
in [0, 1]), then 


rn=pt) 8 
@+ipa+2 '=? 


r 
Ex”) = —— and cov(x”, xX) = 
n+l 


Problem 2.8.20. Let &,..., En be independent and identically distributed random 
variables with normal distribution “~ (m, 07). The quantities € and s*, given by 


_ 1 n 1 n _ 
= — i d 2 = La A z 1, 
ZOR SS ay # 


i=l i=l 


are known, respectively, as sample mean and sample variance (for the sample 
Eees Sa): 

Prove that: 

(a) Es? = 0°. 

(b) The sample mean £ and the sample variance s? are independent. 

(c) E has normal ./(m,o?/n)-distribution, while (n — 1)s?/o? has %?- 
distribution with (n — 1) degrees of freedom. 


Problem 2.8.21. Suppose that X1,...,X;, are independent and identically dis- 
tributed random variables, let v be any random variable with values in the set 
{1,...,7n}, which is independent from X1,..., Xn, and set S, = X,; +- + Xr. 
Prove that 


DS, DX D 
Bia S 


DS, = DX,E EX) Di, — = — 
IEPENER Me ee leg aes 


Problem 2.8.22. Let M(s) = Ee** be the moment generating function for the 
random variables X (see Problem 2.6.32). Prove that P{X¥ > 0} < M(s) for 
any s > 0. 
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Problem 2.8.23. Let X, X1,..., Xn be independent and identically distributed 
random variables and set S, = Jj- Xi, So = 0, Mn = maxo<cj<,Sj; and 
M = SUP,,>9 Sn. Prove that (“E ʻa n” means that the distribution laws of € and 
n coincide): 

(a) M, £ (Mn-1 F X)*,n = 1. 


(b) If S, —> 00 (P-a. e.), then M £ (M + X)+. 
(c) If -oo < EX < 0 nd EX? < oo, then 


DX -D(S + X) 
—2EX i 


EM = 


Problem 2.8.24. Let everything be as in the previous problem and let M(e) = 
SUP„>0(Sn — ne), for £ > 0. Prove that lims ,o €M (e) = (DX)/2. 


Problem 2.8.25. Suppose that £ and ņ are two independent random variables with 
densities f(x), x € R, and f} (y) = To), y € R (i.e., n is distributed uniformly 
in [0, 1]). Prove that in this special case formulas [P §2.8, (36) and (37)] can be 


written as 
pion exe 
Fen (2) = z x = 


0, z< 0, 


and 


1 
f x fe(zx) dx, 0<z<\i1, 
0 


=21 f! 
fen) >f xf(x)dx, <> 1, 
0 
0, z<0. 


In particular, prove that if £ is also uniformly distributed in [0, 1], then 


1 
7 0<z< l, 
z) = 1 
Jine) F z>l 
0 z<0. 


Problem 2.8.26. Let € and 7 be two independent random variables that are 
exponentially distributed with the same parameter A > 0. 
(a) Prove that the random variable = is distributed uniformly in [0, 1]. 
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(b) Prove that if £ and 7 are two independent and exponentially distributed 
random variables with parameters, respectively, A; nd Az, A; Æ Az, then the density 
of E + 7 is given by 


ew/A = e™/2 


Si+) = ce = Te 


T (0,00) (2). 


Problem 2.8.27. Suppose that £ and 7 are two independent standard normal (i.e., 
WV (0, 1)) random variables and prove that: 
(a) Both €/7 and &/|n| have Cauchy distribution with density 


(b) |E|/|7| has density 0. 


1 
m(1+x2)’ xeR. 


2 
je = 
Problem 2.8.28. Let X,,...,X;, be independent and exponentially distributed 
random variables with parameters, respectively, A),...,A,, and suppose that A; # 
A;,i Æ j. Setting Ta = Xı + ... + Xn, prove that the probability P{7, > t} can 
be expressed in the form 


Pil, > t} = ae 


i=l 
and find the coefficients a;n, i = 1,...,n. (Comp. with Problem 2.8.8.) 


Problem 2.8.29. Let &, &,... be any sequence of random variables with Eé, = 0, 
E = | and let $, = & +...+ £n. Prove that for any positive a and b one has 


P{S, > b fi >1}< ; 
{S, > an + b for somen > ‘aaa 


Problem 2.8.30. Suppose that the random variable € takes values in the finite set 
{X1,...,X%}. Prove that 


lim (E&”)!/” = max(x1,..., xx). 
noo 


Problem 2.8.31. Suppose that £ and 7 are two independent random variables that 
take values in the set {1,2,...} and are such that either EE < œo, or En < oo. Prove 
that 


CO 
Emin(&,7) = $ PE > k}P{n > k}. 
k=1 
Problem 2.8.32. Let £ and & be any two independent and exponentially dis- 
tributed random variables with parameters, respectively, A; and A». Find the 


distribution functions of the random variables ats and ute, 
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Problem 2.8.33. Let X and Y be two random matrices and suppose that EY Y* 
is invertible. Prove the following matrix version of the Cauchy-Bunyakovsky 
inequality: 

(EXY*)(EYY*) '(EYX*) < EXX*, 


where the relation < is understood as the difference between the right and the left 
side being non-negative definite. 


Problem 2.8.34. (L. Shepp.) Suppose that X is a Bernoulli random variable with 
P{X¥ = 1} = p,P{X =O} =1-p. 

(a) Prove that one can find a random variable Y which is independent from X 
and is such that X + Y has symmetric distribution (X + Y a —(X + Y)). 

(b) Among all random variables Y that have the above property, find the one that 
has the smallest variance DY. 


Problem 2.8.35. Suppose that the random variable U is uniformly distributed in 
the interval (0, 1). Prove that: 

(a) Given any A > 0, —i In U is exponentially distributed with parameter À. 

(b) tan x(U — 5) is distributed according to the Cauchy law with density 
xeR. 

(c) [nU | + 1 is distributed uniformly in the (discrete) set {1,2,...,m}. 

(d) Given any 0 < q < 1, the random variable X = 1 + ea has geometric 


distribution with P{X = k} = q! (1 — q), k > 1. 


1 
m(1+x?)? 


Problem 2.8.36. Give an example of a sequence of independent and identically 
distributed random variables {X,, X2, ...} with sup, EX? < oo, for all p > 0, but 
such that 


P| oe Xn; < oo} =0 


for any sub-sequence {n1, n2, ...}. 


Problem 2.8.37. Let € and 7 be any two independent random variables with (cum- 
mulative) distribution functions F = F(x) and G = G(x). Since P{max(&,7) < 
x} = P{E < x}P{n < x}, it is easy to see that the distribution function of max(&, n) 
is nothing but F(x)G(x). Give an alternative proof of the last claim by identifying 
the event {max(&,7) < x} with the union of the events {£ < x,& > n} and 
{n < x,& < n}, and by expressing the probabilities of these events in terms of 
the conditional probabilities (in the final stage use formula (68) from § 6). 


Problem 2.8.38. Suppose that £ and 7 are two independent random variables 
whose product £ 7 is distributed with Poisson law of parameter A > 0. Prove that 
one of the variables € and 7 takes values in the set {0, 1}. 
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Problem 2.8.39. Prove that, given a standard normal, i.e., ⁄ (0,1), random 
variable £, one has the following asymptotic result (see Problem 1.6.9): 


P{E > x} ~ 9) asx => œ, where (x)= 


1 
x V2 
What is the analog of this result for a random variable y which has gamma- 
distribution (see [ P §2.3, Table 3])? 


ene /2 


Problem 2.8.40. Let £ be any random variable and let M, be the collection of all 
medians of &, as defined in part (a) of Problem 1.4.23. This object is meaningful 
for arbitrary random variables (Problem 1.4.23 refers to discrete random variables). 
Prove that for any b € R and any p > 1 with E|&|? < œo, one must have 


|u — b|” < 2E|§ — b|”, 


where u = u(E) € M, is a median for £. In particular, if E|E|? < 00, then |u — 
E| < /2Dé.) 


Problem 2.8.41. Suppose that £ and ņ are two independent random variables with 
finite second moments. Prove that the random variables £ + ņ and & — n are 
uncorrelated if and only if DE = Dn. 


Problem 2.8.42. Given any two L!-functions, f and g, their convolution, f * g, 
is defined as f * g = fp f(y) g(x — y) dy. Prove Young’s inequality: 


[igeoeiars fdr f dx 


Problem 2.8.43. According to formula [P §2.8, (22)] the density, f(y), of the 
random variable n = (£) can be connected with the density, f(x), of the random 
variable £ by the relation 


hO = FAOD), 
where h(y) = g™' (y). 


Suppose that J is some open subset of R” and that y = g(x) is some R”-valued 
function defined on J (for x = (x1,...,Xn) € I and y = (y1,..., Yn) € R”, the 


relation y = (x) is understood as y; = j(X1,.--,Xn), i = 1,...,m). Suppose 
that all derivatives age exist and are continuous and that | Jy (x)| > 0, x € Z, where 
YJ. 


Jo(x) stands for the determinant of the Jacobian of 9, i.e., 


Jy (x) = det 


l<i,j<n 


OG; 
aaa 
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Prove that if € = (&,...,&,) is some J-valued random vector with density f(x) 
and if n = g(&), then the density f,()) is well defined on the set p(/J) = {y : y = 
g(x), x € I} and can be written as 


Ii) = KAO) OI 


where h = g7! 


|Jp(x)| > 0). 
Hint. Use the multivariate analog of the integration-by-substitution rule (Prob- 
lem 2.6.74), with g(x) = G(g(x)) fe(x), for some appropriate function G. 


Problem 2.8.44. Let n = AE + B, where: E = (&1,..., En) N = (m1,-.-,1n), Ais 
an n x n-matrix with | det A| > 0, and B is an n-dimensional vector. Prove that 


is the inverse of the function g (we have |J} (y)| > 0, as long as 


__! -irn 
fo) = Tea A (y — B)). 


Hint. Use the result established in Problem 2.8.43 with g(x) = Ax + B and 
prove that |J,-1(y)| = 1/| det A]. 


Problem 2.8.45. (a) Let o(&,7) be the correlation coefficient between two given 
random variables, £ and 7. Prove that 


p(cié + c2, c31 + c4) = pÉ, n) + sign(crcs), 


where sign x = | for x > 0, sign x = 0 for x = 0 and sign x = —1 for x < 0. 
(b) Consider the random variables &, £2, £3, £4 with correlation coefficients 
p(&,&;), i A j, and prove that 


pli + 2, & + &4) = [el é) + pli, &4)] + lolz, &3) + o(&2. &4)]. 


Problem 2.8.46. Let X = (X,,...,X,) be any Gaussian random vector whose 
components are independent and identically distributed with X; ~ . (0,07), i = 


1,...,n. Consider the spherical coordinates, {R, ,,..., ®,—1}, of the vector X = 
(X,,..., Xn); in other words, suppose that 
X, = Rsin®,, 


Xm = R sin Øn cos m- ... cos, 2<m<n-l, 
Xn = R cos ®,„—1ı cos ®,-2... cos Py, 
where R > 0, ®; € [0,27), 1 < i < n — 1. Prove that, for r > 0, g; € [0, 27), i = 


1,...,2 —1,n > 2, the joint density, f(r, 91, ...,n-1), of the random variables 
(R, D,..., By) is given by 
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a=] r2 
r ex (-4) 
P 20? n—2 n-3 


T lecca) = Ono cos" gı cos” ” g2... COS Øn—2, 


where we set by convention gp = 0. 


Problem 2.8.47. Suppose that X is a random variable with values in the interval 
[0, 1] and such that the distribution of X is given by the Cantor function (see § 3). 
Compute all moments EX”, n > 1. 


Problem 2.8.48. (a) Verify that each of the following functions is a (cummulative) 
distribution function: 


Fo(x) = exp(—e™™), x € R; 


0 0 
F(x) = as: where a > 0; 
exp(—x"“), x = 0, 


lie 0, 
Fw(x) = a By we where a > 0. 


1, x > 0, 


These functions are known, respectively, as the Gumbel’s distribution, or double 
exponential distribution; comp. with Problem 2.8.17 (Fg(x)), Fréchet distribution 
(Fp(x)), and Weibull distribution (Fw(x)). 

These distributions are special cases of the following three types (everywhere 
below we suppose that u € R, o > 0, anda > 0): 

Type 1 (Gumbel-type distributions): 


we tad 
F(x) = exp { ag \. 
Type 2 (Fréchet-type distribution): 


0, x <p, 
eel (ES) 
exp {| — | ———— , X> jih 


Type 3 (Weibull-type distribution): 


apf - (E) ee 
aE E 5 Bo 


1, x > 
(b) Prove that if X has Type 2 distribution, then the random variable 


Y = In(X¥ —- u) 
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has Type 1 distribution. Similarly, if X has Type 3 distribution, then 
Y = —ln(u — X) 


also must have Type 1 distribution. 


Remark. This explains why Type 1 distributions, which are often referred to as 
extreme value distributions, are fundamental in the “extreme value theory”. 


Problem 2.8.49. (Factorial moments.) Given some random variable X, its factorial 
moments are defined as 


Mo = EX(X — 1)... (X —r +!), P= si 


ies mr) = E(X) 
If X has Poisson distribution law of parameter A, then for r = 3 one has 
m3) =A. Calculate m) for an arbitrary r. 


Problem 2.8.50. Suppose that 6; and 62 are two independent random variables that 
are distributed uniformly in [0, 27), and let X; = cos 6, and X2 = cos 62. 
Prove that 


1 law 
rice + X27) = Xı X2 


«claw ” 
> 


ad 29 cee A * FF 
(recall that “=,” or “=,” means “identical in law”). 


Problem 2.8.51. The random variable 0 is distributed uniformly in the interval 
[0, 27) and the random variable C is distributed in the real line R according to 
the Cauchy law with density art xeR. 

(a) Prove that the random variables cos? 6 and 1/(1 + C?) share the same 
distribution law (ie., cos? 6  1/(1 + C2)). 

(b) Prove that cot g = C. 

(c) Find the densities of the distribution laws of the random variables sin(0 + o), 


ọ € R, and g tan 0, œ > 0. 


Problem 2.8.52. Let & be any exponentially distributed random variable with 
P{é > t} = e™,t > 0, and let N be any standard normal random variable (i.e., 
N ~ MN (0, 1)), which is independent from &. Prove that 


law AL 
E = VÆIN], 
i.e., the distribution law of E coincides with the distribution law of of 2E |N]. 


Problem 2.8.53. Suppose that X is a random variable that takes values in the set 
{O,1,..., N} and has binomial moments bo, b,...,bn, given by by = C% = 
E(X) = GEX(X — 1)... (X — k + 1)—see Sect. A.3. 
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Prove that the moment generating function of X is given by 


N N N 
Gx(s) = Es = Dart- = Yos"( DCE he), 
k=0 n=0 k=n 
and that, consequently, for any n = 0,1,..., N, one has 
N 
P{X =n} = X (C DCE by. 
k=n 


Problem 2.8.54. Suppose that X and Y are two independent random variables that 
are distributed uniformly in the interval [0, 1]. Prove that the random variable 
_ je +, O<X+Y <1, 
(X¥+Y)-1, 1<X4+Y <2, 


is also uniformly distributed in [0, 1]. 


Problem 2.8.55. Suppose that X;,...,X, are independent and identically dis- 
tributed random variables that take the values 0, 1 and 2 with probability 1/3 each. 
Find a general formula for the probabilities 


P,(k) = P{X, +-+ X, = k}, O0<k <2n 
(for example, P,(0) = 3™, P (1) = 23, P,Q) = C2, 3™, PG) = (Cf, — 
nC 4 )-3~", and so on). 


Problem 2.8.56. The random variables £ and ņ are such that E£? < œo and En” < 
oo. Prove that: 

(a) DE + n) = DE + Dn + 2cov(E, n); 

(b) If, in addition, € and 7 are independent, then 


D(&n) = Dé - Dy + Dé - (En)? + Dn - (E€)’. 


(See Problem 2.8.69.) 


Problem 2.8.57. The joint density, f(x,y), for the pair of random variables 
(X,Y), is said to be “spherically symmetric” if it can be expressed as 


f(x,y) =a’ +y’), 


for some choice of the probability density function g = g(z), z > 0. Assuming that 
R and @ represent the polar coordinates of (X, Y ), i.e., X = R cos 0, Y = Rsin9, 
prove that 0 is uniformly distributed in [0, 277), while R is distributed with density 
hr) = 2nrg(r°). 
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Problem 2.8.58. Given a pair of random variables, (X, Y), with density f(x, y), 
consider the complex random variables 


Z, = Ze", teR, whereZ = X +iY. 


Prove that in order to claim that the distribution law of Z, does not depend on t € R 
it is necessary to assume that f(x, y) has the form f(x, y) = g(x? + y?), where, 
just as in the previous problem, g is some probability density function. 


Problem 2.8.59. Let € and ņ be any two independent and exponentially distributed 
random variables with density f(x) = Ae~**, x > 0. Prove that the random 
variables € + n and E are independent. 


Problem 2.8.60. Suppose that £ and ņ are two independent random variables with 
densities 


2 


1 pe a 
f(x) = — |x| <1, and FO) = Że 27, y>0, o >0. 
E x 1 o2 


1 
Vi= x2 
Prove that the random variable £ ņ is normally, ./ (0, o°)-distributed. 


Problem 2.8.61. Consider the random matrix ||&;; || of size n x n, whose (random) 
entries are such that P{é; = +1} = 1/2. Prove that the expected value and the 
variance of the determinant of this random matrix are equal, respectively, to 0 andn!. 


Problem 2.8.62. Suppose that X1, X2,... are independent random variables that 
are distributed uniformly in the interval [0, 1]. Prove that 


n n -1 
2 
E x| x | >- as n>. 


Problem 2.8.63. Suppose that the random vector X = (X1, X2, X3) is distributed 
uniformly in the tetrahedron 


X3 = {(x1, X2, X3) : X1 > 0, X2 > 0, x3 > 0 xı + x2 + X3 < C}, 


where c > 0 is some fixed constant. Find the marginal distributions of the random 
vector X = (X1, X2, X3), associated with the components X; and (X1, X2). 

Hint. The density, f (x1, x2, x3), of the vector X = (X1, X2, X3) is equal to the 
constant V—!, where V = c? /6 is the volume of X3. With this observation in mind, 
prove that the density of (X1, X2) is given by f(x1, X2) = 6(c — xı — x2)/c and 
then calculate the density of X4. 


Problem 2.8.64. Let X,,...,X, be positive, independent and identically dis- 
tributed random variables with EX; = p, EXT! = r and Sm = Xi +- ++ Xn, 
1 <m <n. Prove that: 

(a) ESP! <r; 

(b) EX; S7! simit = lesh 
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(c) ESm S7! =m/n,ifm < n; 
(d) ES, S3! = 1+ (n — m)ES}', ifm <n. 
Problem 2.8.65. (Dirichlet distribution.) In [ P §2.3, Table 3] the beta-distribution, 


with parameters œ > 0 and > 0, is defined as a probability distribution on [0, 1] 
with density 


xT! (1 — x)f7! 


f(x; B) = BaD) 


where 
1 
Bia, B) = | hd ah) dx 
0 


= Pat p) i = i K a .=x 
(= O with T (œ) =f Te dx). 


The Dirichlet distribution, is a multivariate analog of the beta-distribution and is 
defined as the probability distribution on the set 


Ak-1 = {(x1,..., Xk-1): X; 20,0 < xı +... xk- <1}, for k>2, 


given by the density 
Parseghian, 1,- - , Æk—1, ¥k) = 
T (œ +... +k) “jal ag—l | 
= x we (La (xp +. + a, 
(a)... (ax) ~! et 
where a; > 0,7 = 1,...,k, are given parameters. Alternatively, the Dirichlet 
distribution can be defined on the simplex {(x1,..., Xk) : x; = 0, ey x; = 1} by 


specifying the “density” 


(ay +...+ ax) ar 
(ay)... P (ax) 7! 


Sf (X10 Xk; OL, 0 Uk—1, Ue) = aa 


(the quotation marks around the word “density” are simply a reference to the 


fact that the function F (x1, 1+ XkjQ1,...,Q@%—1,@~) does not represent a density 
relative to the Lebesgue measure in R*). 

Suppose that all component of the random vector X = (X),..., Xx) are non- 
negative, i.e., X; > 0, and are such that the sum X; + --- + Xk = 1 has 
Dirichlet distribution with density f(x1,...,X4%-13Q1,...,Q@k—1, 0%) on Ag—y (in 
the sense that this function represents the joint density of the first k — 1 components, 
X),...,Xx—1, of the vector (X|,..., Xg), after eliminating the last component, Xx, 


from the relation X, = 1 —(X, +... + Xk-1)). 
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(a) Prove that 


k 
aj aj( Disa — a) 


EX; = kE ? DX; = 2 ’ 
" ; k k 
ia 4i (Dii) (Dio + 1) 
Ajj ; A 
cov(X j Xp) = — : n l ji # ja 
(2u ai) ‘os Oj; + 1) 
(b) Prove that for every choice of non-negative integer numbers, 71,..., rk, one 


can write 
r(Eiaia) Ii- Pi +ri) 
Ma ror (Ea +r) 


(c) Find the conditional density, a xX 
variables Xz, given X1,..., Xx—1. 


=p eee cla 


Xp_-1 (Xk | X1,...,X"-1), Of the random 


Problem 2.8.66. The concentration function of a random variable X is defined as 


O(X;1) = supP{x < X <x +l}, 1>0. 
xeR 


Prove that: 
(a) If X and Y are two independent random variables, then 


O(X + Y;1) < min(Q(X;/), O(Y;])), forall! > 0. 


(b) There is a number x;", for which one can write O(X;/) = P{xř < X < 
xy + l}, and the distribution function of X can be claimed to be continuous if and 
only if O(X;0) = 0. 

Hint. (a) If Fy and Fy stand for the distribution functions of X and Y, then 


Pe<X+¥sc+h=[ [Fy(z+l—y)— Fx(z— y)] dFy(y). 


Problem 2.8.67. Suppose that £ ~ “N (m, 0°), i.e., € has normal distribution with 
parameters m and o”, and consider the random variable n = ef, which has log- 
normal distribution with density (see formula [ P §2.8, (23)]) 


1 (m—In y)? 
ial eo = |p 29 
n = : 
0, 


y <0. 
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Given any a € [-1, 1] define the function 


tog) = AO -asino -mny y> 0. 
0, y <0. 


(a) Prove that f(y) is a probability density function, i.e., f(y) > 0 and 
Jo f)dy =1. 
0 y) ay 
(b) Suppose that ¢ is some random variable with density f¢(y) = f(y),a Æ 
0, and prove that 7 and ¢ have identical moments of all orders: En” = E¢”, > 1. 
(This shows that the log-normal distribution admits moments of all orders, and yet 
this distribution is not uniquely determined by its moments.) 


Problem 2.8.68. Let (&,)n>0 be any sequence of independent, identically, and 
symmetrically distributed random variables, and, given any n > 1, let So = 0 
and S, = & +... + En. Define the respective sequences of partial maximums and 
partial minimums, M = (M,,),>9 and m = (m,),>0, given by 


Mn = max(So, S1,..., Sn) and m, = min(So, S1, ..., Sn). 


As a generalization of Problem 1.10.7, prove that for any fixed n one has 


(M,, E Sh, Sn — Mn, Sn) = (mn, Mp, Sn) 2 (Mn, —Myn, Sn); 


i.e., the joint distribution laws of the above triplets of random variables coincide. 
Hint. Use the following relation, which is easy to verify: 


(Sn — Sn-k;k < n) = (Skik <n) foranyn>1. 


Problem 2.8.69. The random variables £ and 7 are such that DE < co and Dn < oo. 
Prove that 
cov*(E, n) < Dé Dn 


and explain when does the identity in this relation hold. 


Problem 2.8.70. Let &,..., En be independent and identically distributed random 
variables. Prove that 


P{min(é1,...,&) = é} = nl. 


Show also that the random variables min(&1,...,én) and Ttg=minéi 
independent. 


gn)} are 
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Problem 2.8.71. Let X be any random variable with distribution function F = 
F(x) and let C be any constant. Find the distribution functions for the following 
random variables: 


X, if|X| <C, 
X VC =max(X,C), X^AC=min(X,C), Xo = if|X| < 
0, if |X|>C. 
Problem 2.8.72. Let X be any random variable, let A > 0 and let g(x) = i > ; 
x 


Prove the following inequalities: 


Eg(|X|*) 


E[e( xP) - 90*)] < PIXI 2 9) = ED 


Problem 2.8.73. Let £ and 7 be two independent random variables that have 
gamma-distribution with parameters, respectively, (œ, 8) and (a2, B) (see [P §2.3, 
Table 3]). Prove that: 


(a) The random variables € + 7 and È 
E+N 


are independent. 


(b) The random variable 


(see also [ P §2.3, Table 3]). 


has beta distribution with parameters (a1, 2) 


Problem 2.8.74. (Bernoulli Scheme with random probability for success.) Suppose 


that the random variables &,...,&, and a are chosen so that x is uniformly 
distributed in (0,1), &, i = 1,...,n, take values 1 and O with conditional 
probabilities 


P(§; =1|x = p) =p, P: =0|z = p)=1-p, 


and, furthermore, are conditionally independent, in the sense that (in what follows 
Xi, stands for a number that is either 0 or 1, fori = 1,...,7) 


P(& = x1,...,€ =X, | 7) = P(E = x, | 7)... P(E, = x, | 7). 


Prove that: 
(a) One has the identity 


1 


PI{E = Xia En = x} = ———— 
em Cs 
where x = x1 +... + Xn. 
(b) The random variable S, = & + ... + §, is uniformly distributed in the 
(discrete) set {0,1,...,}. 
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(c) The conditional distributions P(x < p|& = x1,...,& = Xn) and P(x < 
P| Sn = x1 +... + Xn) coincide, for any p € (0, 1). 

(d) The conditional distribution P(z < p |S, = x), where x = xı +... + Xn, 
has density 


fais, (P 1x) = (0 + DCX p* 0 = py", 


x+1 

n+2 

Problem 2.8.75. Let E and 7 be two non-negative, independent and identically 

distributed random variables with P{E = 0} < 1, and suppose that min(&, 7) and 

&/2 have the same distribution. Prove that € and 7 must be exponentially distributed. 
Hint. Consider the relation 


and one has E(x | S, = x) = 


(P{E > x}? = P{min(&, n) > x} = P{E > 2x}, 


and conclude that (P{é > x})?” = P{€é > 2nx}. Then conclude that for every 
a > 0 and for every non-negative rational x one has P{E > x} = e~?*/*, where 
A = —InP{é > a}. Finally, conclude that P{é > x} = e7% /4, for all x > 0. 


Problem 2.8.76. Let & and 7 be two independent and identically distributed random 
variables with distribution function F = F(x). Prove that 


PE =m = >) | F(x) — F(x-)/?. 


xeER 


(Comp. with Problem 2.12.20.) 


Problem 2.8.77. Consider the random variables X),..., X,, and prove the follow- 
ing “inclusion—exclusion” formula (for the maximum of several random variables— 
comp. with Problems 1.1.12 and 1.4.9): 


max(X1,...,Xn)= )Xi— YD) minX, Xa) 


i=l 1<i| <in<n 
+ So min(Xj,, Xa, Xn) +... CD" min(X,..., Xn). 
1<i) <in<i3<n 


By choosing the random variables X4, ..., Xn, accordingly, prove the “inclusion— 
exclusion” formula for the probability P(A; U...U An) (see again Problems 1.1.12 
and 1.4.9). 


Problem 2.8.78. Let &|,&,... be any sequence of independent and identically 


distributed Bernoulli random variables with P{gé,; = 1} = P{é = —1} = 1/2 
1 - 
and let z,(w) = =} se) and Z99(@) = limy—+oo Zn(@). Prove that the 


distribution function F(x) = P{zo9(w) < x} is the Cantor function (see [P §2.3]). 
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In particular, this means that Law(z,..) refers to a probability distribution concen- 
trated on the Cantor set. The random variables zæ = Zoo (%) is an example of what is 
known as fractal random variable (its distribution is neither discrete nor absolutely 
continuous—see Problem 2.3.18). 


Problem 2.8.79. Prove that in the binomial case (see [P §1.2, 1] and [P §2.3, 
Table 2]) the distribution function 


m 


Wape pe OSMA 
k=0 


can be expressed in terms of the (incomplete) beta-function: 


1 1 
B, : 2 mq— n—m—1 d : 
vep) aa. a= x 


where 
1 
B(p,4) = f xP — xt! dx 
0 


Tra) .. a ieee pe ) 
(= Fe with r(p)= f xP"e * dx |. 


Problem 2.8.80. Prove that the Poisson distribution function F(m;à) = 
a A, m = 0,1,2,..., can be expressed in terms of the (incomplete) 


gamma-function as 
1 Cc 
F(m; à) = =| xe * dx. 
m! À 


Problem 2.8.81. In addition to the mean and the variance, another important 
characteristics of the shape of the density f = f(x) are the “skewness” parameter, 
given by 
&3 = 7, 
o? 
and the “kurtosis” or “peakedness” parameter, given by 


y H4 
4= >> 
o4 


where juz = f (x — W) f(x) dx, u = f xf (x) dx, and 0? = po. 
What are the values of the parameters a3 and a, for the distributions listed in 
[P §2.3, Table 3]? 
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Problem 2.8.82. Suppose that X is some binomial random variable with parame- 
ters n and p (see Table 2 in [P §2.3, 1]). Analogously to Problem 2.8.81), define 
the “skewness” parameter 


ent) E(X —Ex)? 
skw =a = ——— 
3 (DX)3/2 
and prove that (with q = 1 — p) 
q—P 


skw(X) = ‘ 
/npq 

df 0 < p < 1/2, then skw(X) > 0, in which case one says that the distribution 
has “long right tail”.) Find also the value of the “kurtosis” parameter kur(X) = a4 
_ E(X-Ex)4 
(= oxy) 
Problem 2.8.83. Suppose that &|,...,&, are independent and identically dis- 
tributed random variables with “skewness” parameter a3 (= skw(&)) and with 
“kurtosis” parameter a4 (= kur(&,)). Prove that 


skw(& +... + £n) = n~! skw(¢1) 


and 
kur(& +... + En) = 3 + n` |{kur(&) — 3}. 


Problem 2.8.84. The well known binomial distribution arises as the distribution 
law of the total number of “successes,” v, in n independent trials, with probability 
for success in each individual trial 0 < p < 1. More precisely, this distribution 
can be identified with the collection of probabilities P {v = r} = C} p’q", 
r =0,...,n, for some fixed integer n and fixed 0 < p < 1. The negative binomial 
distribution P”{t = k} (a.k.a. the Pascal distribution) arises as the probability 
distribution of the trial, t, during which r-“successes” are observed for the first 
time. Prove that, for any r = 1,2,... and any k > r, one has 


P(r =k) = Cl p'a, k=r,r+1,..., 


where p is the probability for success in a single trial. The negative binomial 
distribution can be identified with the collection of all probabilities P"{t = k}, 
k = r,r + 1,..., for fixed r. Given any fixed r, prove that E"t = rq/p, where 
Gailey 

Problem 2.8.85. The (discrete) random variable £, with values in the set {1,2,...}, 
is said to have a discrete Pareto law with parameter p > 0, if 


Pít =k} = a 
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Prove that 


aa _ Cp) 
f(o + 1) tle +1)’ 


where ¢(s) = >“°°., 4 stands for Riemann’s zeta function (for a description of the 


n=l ns 


continuous Pareto law see Problem 3.6.23). 


and E = 


Problem 2.8.86. Let &,..., En be independent and identically distributed random 
variables with distribution function F(x |0), which depends on some (random) 
parameter 0, with prior distribution JI(0), known to be in some class .%. Let 
IT(0 | x1,...,X,) be the posterior distribution, calculated from the Bayes formula, 
where x|,...,X, are the observed values of &,...,&,. If the posterior distribution 
also belongs to the class .%, we say that the distribution /7(@) is the .% -conjugate 
of the distribution F(x | 0). 
Prove that: 


(a) If F(-| 0) ~ N (0,az') and (0) ~ WV (mo, bo"), then 


Elan) N (POE SOE a l ). 


bo + nao "bo + nao 

(b) If F(-|@) ~ N (0,07!) and 7(0) ~ P'(k;A) = gamma-distribution with 
density 
ak yk-le-ax 


~ Fe Too) (x), 


Yæ; (x) = 


where k > 0 and à > 0, then 
1 1 j 
TMO |x1,...;, X) ~T k+ sm At srt... +X, : 


(c) If F( |0) ~ exp(@) = exponential distribution with parameter 0, and 
IT(0) ~ I'(k; A), then 


IT(0 | x1,..., Xn) ~ Vk +n; à + (x1 +... + %)). 
(d) If F(-| 0) ~ Poisson(@) and IT(@) ~ T (k; A), then 
IT(0 | x1,..., Xn) ~ Vk + (x1 +... + Xr); à +n). 
(e) If F(-|@) ~ Bernoulli(@) and (0) ~ B(k; L) = beta-distribution with 
density 


xk! a _ ola 


Best) (x) = BED) 


Io), 
then 


TI (0| x1,..., Xn) ~ B(k + x1 +... + Xn); L +n- (xi +... + Xn)). 
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Problem 2.8.87. Suppose that X is a random variable with one of the following 
distributions: binomial, Poisson, geometric, negative-binomial, or Pareto. Find the 
probability of the event {X is even}. If, for example, X has geometric distribution 
with parameter p (see [ P §2.3, Table 2]), then P{X is even} = (1 — p)/(2— p). 


Problem 2.8.88. (Exponentially distributed random variables.) Let &,...,&) be 
independent and exponentially distributed random variables with parameters, re- 
spectively, Ay,...,An. 

(a) Prove that P{& < &} = A)/(Ai + Ad). 

(b) Prove that minj<;<, & has exponential distribution with parameter A = 
y= A, and conclude from part (a) that 


Plg; = min l = To 
sks k=l 


(c) Assuming that A; Æ Àj, i Æ j, find the density of the random variable 
Ei +-+- + &, (for the case n = 2, see Problem 2.8.26). 

(d) Prove that E min(&, &) = 1/(A; + 42) and find E max(&, &). 

(e) Find the distribution density of the random variable £; — &. 

(£) Prove that the random variables min(&,, £2) and £; — & are independent. 


2.9 Construction of Stochastic Processes with a Given 
System of Finite-Dimensional Distributions 


Problem 2.9.1. Let 2 = [0, 1], let F be the class of Borel sets in [0, 1], and let 
P stand for the Lebesgue measure on [0, 1]. Prove that (92,.%,P) is a universal 
probability space, in the sense that, given any distribution functions F(x), one can 
construct on (2,.#,P) a random variable £ = E(w), w € 2, whose distribution 
function, F;(x) = P{& < x}, coincides with F(x). 

Hint. Set (w) = F7! (œ), where F~'(w) = sup{x: F(x) < w},0 <w < 1, 
(&(0) and €(1) may be chosen arbitrarily). 


Problem 2.9.2. Verify the consistency of the families of probability distributions 
described in the corollaries to [ P §2.9, Theorems 1 and 2]. 


Problem 2.9.3. Prove that Corollary 2 to [ P §2.9, Theorem 2] can be derived from 
[P §2.9, Theorem 1]. 

Hint. Show that the measures defined in [ P §2.9, (16)] form a consistent family 
of (finite-dimensional) distributions. 


Problem 2.9.4. Consider the random variables T,,, n > 1, from [P §2.9, 4] and 
let Fa, n > 1, denote their respective distribution functions. Prove that F,4)(¢) = 
h F,(t —s)dF(s),n > 1, where F, = F. 
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Problem 2.9.5. Prove that P{N; = n} = Fa (t) — Fn+i(t) (see [ P §2.9, aDJ). 


Problem 2.9.6. Prove that the renewal function m(t) from [P §2.9, 4] satisfies 
what is known as the recovery equation: 


mi) = Fa + f me —x do. (x) 


Problem 2.9.7. Prove that the function defined by formula [P §2.9, (20)] is the 
only solution to equation (*), within the class of functions that are bounded on 
every finite interval. 


Problem 2.9.8. Let T be an arbitrary set. 

(a) Suppose that for every t € T there is a probability space ({2;, -¥;, P+), and let 
N = [ [lrer 2: and F = @erF,. Prove, that there is a unique probability measure 
P, defined on the (2, F), for which the following independence property holds: 


e(r] z.) =|] Po), 


teT teT 


where B, € ¥;,t € T, and B, = &, for all but finitely many indices t € T. 
Hint. Define P on the some appropriate algebra and use the argument of the 
proof of the Ionescu-Tulcea Theorem. 


(b) Suppose that for every £ € T there is a measurable space (F;, &) and a 
probability measure P, defined on that space. Prove the following result, which is 
due to Łomnicki and Ulam: there is a probability space (2, F, P) and independent 
random elements (X;);e7, such that each X; is F /&;-measurable and P{X, € B} = 
P,(B), B € &. 


2.10 Various Types of Convergence of Sequences 
of Random Variables 


Problem 2.10.1. By using [P §2.10, Theorem 5] prove that in [P §2.6, Theo- 
rems 3 and 4] one can replace “convergence almost surely” with “convergence in 
probability”. 
P 
Hint. If &, — &, |&:| < n, En < oo, and Eļé, — &| A 0, then one can find 


some ¢ > 0 and a sub-sequence (nk)k>1, such that Eļé„, — &| > € and £n, LA E. 
Furthermore, according to [ P §2.10, Theorem 5], one can find such a sub-sequence 
(ki)i>1, that Eng, 7 £ (P-a.e.). The next step is to use [P §2.6, Theorem 3] to find 
a contradiction to the assumption E|&, — &| A 0. 
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Problem 2.10.2. Prove that the space L™ is complete. 
Hint. Take a sequence (&)x>1, which is fundamental in L°®, in the sense that 
lEn — Enlltco < an, forn < m, with a, — 0 as n — ov, and set 


E(w) = lim, £,(@), if lim, E,(@) < œO, 


; if lim, &:(@) = œ. 


Prove that, as defined above, &(w) is a well defined random variable and, further- 
more, ||Ẹ — allL < an > 0 as n > oo. 


P P 
Problem 2.10.3. Prove that if &,—£& and, at the same time, E£ —n, then £ and 7 are 
equivalent, in the sense that P{E # n} = 0. 


P P 
Problem 2.10.4. Let &,—& and nı,—>n and suppose that the random variables £ and 
n are equivalent. Prove that for any £ > 0 one has 


P{|E, — M| = e} > Oasn > œ. 


P P 
Problem 2.10.5. Let &,—& and nı—>n. Prove that if ọ = (x,y) is some 


P 
continuous function, then o (En, n )—> (E, n). (Slutsky’s lemma.) 
Hint. Given some € > 0, choose c > 0 so that 


Pilé >ch<e, P{lm|>ch<e, n>=1, 
P{JE| > ch <e, P{|n| > ch <e. 


As the function g = g(x, y) is continuous, it must be uniformly continuous on 
the compact [—c,c] x [—c,c]. Therefore one can find some 6 > 0 so that for any 
x,y € [-c,c] with p(x, y) < 8, one has |y(x) — g(y)| < e (e(x, y) = max(|x! — 
y!|, |x? — y?|), x = (x!, x7), y = (y!, y?)). Finally, consider the estimate 


P{|@(En, Mm) — PE, n) > e} < PhlEn| > c} + Pilm] > ch 
+ Pile| > c} + Pilly > c} + Piles — el > y + Pilg — n| > ô}, 


and prove that for large n the right side does not exceed 6e. 
P P 
Problem 2.10.6. Let (£n — £)? — 0. Prove that &? > &. 


d 
Problem 2.10.7. Prove that if &, — C, where C is some constant, then the 
convergence must hold also in probability; in other words 


En 4 C => He. 
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Hint. For a given € > 0, consider the function f(x) = (1 — l-ely+ and notice 
that 


Pilg: — c| < £} > Efe(En) > E fele) = 1. 


Problem 2.10.8. Let the sequence (&;),>1 be such that X<; Elé,|? < oo, for 
some p > 0. Prove that £, — 0 (P-a. e.). 
Hint. Use Chebyshev’s inequality and the Borel-Cantelli lemma. 


Problem 2.10.9. Let (&)n>1 be a sequence of identically distributed random 
variables. Prove the following implications: 
[o.e} 
Elf&i|<0o <=> J Pilé] > en} <o, e>0 => 


n=1 
eS > P| 
n=1 


Hint. Use the following easy to verify inequalities: 


E n 


>e} <00, £> 0 => Pag (P-a. e.). 


n 
n 


e $ P{lé > en} < Elgi] < e +e P{lél > en}. 


n=1 n=1 


Problem 2.10.10. Let (&,),>1 be some sequence of random variables and let Ẹ be 
a random variable. 

(a) Prove that if P{|&, — E| > € i. o.} = 0 for every € > 0, then &, — & (P-a. e.). 

(b) Prove that if one can find a sub-sequence (ng), such that £n, —> é (P-a. e.) 
and maxy,_, </<n, Er — &n,_,| —> 0 (P-a. e.) for k —> oo, then &, — é (P-a. e.). 

(c) Prove that if &, — & (P-a. e.), then P{|&, —&| > ¢i.0.} = 0, for every € > 0. 
(This is the converse of property (a).) 


Problem 2.10.11. Define the distance, d (€E, n), between two random variables, € 
and ņ, as 


IF =a 
dE, n) = E—~___, 
1+ |E = n| 
and prove that the function d = d(-,-) defines a metric in the space of all 


equivalence classes of random variables (on a given probability space) for the 
relation “identity almost everywhere.” Prove that convergence in probability is 
equivalent to convergence in the metric d (-, -). 

Hint. Check the triangle inequality and convince yourself that 


E 2 [En — El 
Lre | >} SET a 


= eP{lé, =6| 2 E+ P{|En —ġ]| > Ej, 


for every € > 0. 
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Problem 2.10.12. Prove that the topology of convergence almost surely is not 
metrizable. 
Hint. Suppose that there is metric, p, which defines convergence almost surely, 


P 
and consider some sequence (&,),>1, chosen so that £, — 0, but & 4 0 (P-a. e.). 
Then, for some ¢ > 0, one can find a sub-sequence (ng)k>ı so that p(&,,,0) > € 


P 
and, at the same time, n, — 0. Finally, by using [P §2.10, Theorem 5] one can 
find a contradiction to the claim that convergence in the metric p is the same as 
convergence almost surely. 


P 
Problem 2.10.13. Prove that if X; < X2 < ... and X, —> X, then one also has 
Xn — X (P-a.e.). 


Problem 2.10.14. Let (X;,)n>1 be a sequence random variables. Prove that: 
(a) X, > 0 (P-a. e.) = > S,/n — 0 (P-a. e.), where, as usual, S, = Xi +--+ 
Xn. 


LP LP 
(b) X, — 0 => S,/n —>0, if p > 1 and, in general, 


Sn 
ype o. 
n 


P P 
(c) In general, X, — 0 does not imply the convergence S„/n — 0 (comp. with 
the last statement in Problem 2.10.34). 


P 
(d) S,,/n — 0 (P-a. e.) if and only if S,,/n — 0 and Son /2” — 0 (P-a. e.). 
Problem 2.10.15. Let (@,.%,P) be a probability space, on which one has the 


P 
convergence X, —> X. Prove that if P is an atomic measure, then X, > X also 
with probability 1 (for the definition of atomic measure, see Problem 2.3.35). 


Problem 2.10.16. According to part (a) in the Borel—Cantelli lemma (the “first 
Borel-Cantelli lemma”), if )°°2., P(|&,| > £) < oo for some € > 0, then &, —> 0 
(P-a.e.). Give an example of a sequence {&,} for which &, — 0 (P-a.e.), and yet 
ye P(lEn| > £) = œ, for some e > 0. 


n=1 


Problem 2.10.17. (On part (b) in the Borel—Cantelli lemma; i.e., on the “second 
Borel-Cantelli lemma.”) Let 2 = (0,1), Z = A&((0,1)), and let P stand 
for the Lebesgue measure. Consider the events A, = (0,1/n) and prove that 
XO P(A,,) = œ, even though every œ € (0,1) can belong only to finitely many 
sets Ay,..., Afi/o}, Le., P{ Ay i.o.} = 0. 


Problem 2.10.18. Prove that in the second Borel-Cantelli lemma, instead of requir- 
ing that the events 41, Az,... are independent, it is enough to require only that these 
events are pair-wise independent, in that P(A; O A;) — P(A;)P(A;) = 0,7 Æ j;in 
fact, it is enough to require only that A), A2, ... are pair-wise negatively correlated, 
in that P(A; N Aj) = P(A;)P(A;) < 0,7 £ Jx 
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Problem 2.10.19. (On the second Borel—Cantelli lemma.) Prove the following 
variants of the second Borel—Cantelli lemma: given an arbitrary sequence of (not 


necessarily independent) events A;, A2, . . . , one can claim that: 
(a) If 
ai zani i=l P(A; Ax) 
> P(A,) = 00 and liminf —~———___ = 1 
n=1 n D k= P(A;)] 
then (Erdés and Rényi [37]) P(A, i.o.) = 1. 
(b) If 


oo n _ P AA 
JO P(An) = 00 and liminf dais AA) = 
n=1 n Dope P(Ax)] 
then (Kochen and Stone [64], Spitser [125]) L > 1 and P(A, i.o.) = 1/L. 
(c) If 


’ 


S j P(A; Ak) — P(A;)P(Ax 
>> P(A) =oo and liminf Disi<ksnl : k) )P(A;)] <0 
n=1 % [ k=1 P(A;)] 


then (Ortega and Wschebor [92]) P(A, i.o.) = 1. 
(d) If pa P(A,) = oo and 


Wici<ken[P(Ai Ae) — HP(Ai)P(AB)I 
Dota PUD 


ay = liminf 
n 


where H is an arbitrary constant, then (Petrov [95]) P(A, 1.0.) > 1 and H + 


H +204 
2aH > 1. 
Problem 2.10.20. Let A;, A2,... be some sequence of independent events and 
suppose that X°, P(A,) < oo. Prove that for Sn = )°;_, /(Ax) the following 


stronger version of the “second Borel-Cantelli lemma” is in force: 


=1 (P-a.e.). 


Problem 2.10.21. Let (X„)n>1 and (Yn)n>1 be any two sequences of random 
variables with identical finite-dimensional distributions, i.e., Fy,.x, = Fy,....y,> 


P 
n > 1, and suppose that X, —> X. Prove that there is a random variable Y, whose 
distribution is identical to the distribution of X (notation: Law(X) = Law(Y), or 


aw . . P 
X S Y,or X 2 Y ), for which one can claim that Y, > Y. 
Problem 2.10.22. Let (X;,),>1 be a sequence independent random variables with 


P ; 
Xn — X, for some random variable X. Prove that X must be a degenerate random 
variable. 
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Problem 2.10.23. Prove that for every sequence of random variables, £, &,..., it 
is possible to find a sequence of constants, a,,d2,..., sO that &,/a, — 0 (P-a. e.). 


Problem 2.10.24. Let €, &,... be a sequence random variables and let S, = & + 
--+&,,n > 1. Prove that the set {S,, > }, i.e., the set of all w € Q, for which the 
series ei Er (œw) converges, can be represented in the form: 


{S, >}= N U N {sup Is; - S| < NT ii 


N>l1m>l1k>m °= 


Similarly, the set {S;, -> }, on which the series es &,(@) diverges, can be 
represented in the form 


{LA =U) U {sup isi - S| > NT l, 


N>1m>1lk>m 5 


Problem 2.10.25. Consider the probability space (2, F, P), in which the sample 


P 
space §2 is at most countable, and prove that if &, — &, then &, — & (P-a. e.). 


Problem 2.10.26. Give an example of a sequence of random variables, such that 
with probability 1 one has lim sup &, = oo, lim inf &, = —oo, but, nevertheless, one 


P 
can find a random variable 7 with &, —> n. 


Problem 2.10.27. Prove the following version of the the 0-1 law (comp. with the 
0-1 law of [ P §4.1]): if the events A1, A2,... are pairwise independent, then 


0, if P(A) < œ, 


P{A,, i.0.} = 
1, if }> P(A,) = œ 


Problem 2.10.28. Let Aj, A>,. ee be an arbitrary sequence of events, such that 
lim, P(A,) = 0 and 7, P(An N An+1) < 00. Prove that P{A,, i.o.} = 0. 


Problem 2.10.29. Prove, that if >, P{|&,| > n} < oo, then lim sup, (|&:|/7) < 1 
(P-a.e.). 


Problem 2.10.30. oe that &, | é (P-a.e.), E]&,| < œ, n > 1, and inf, Eé, > 
—oo. Prove that &,, oe i.e., E|, — E| > 0. 


Problem 2.10.31. In conjunction with the second Borel-Cantelli lemma, prove that 
P{A,, i.o.} = 1 if and only if }>, P(A N An) = œ, for every set A with P(A) > 0. 


Problem 2.10.32. Suppose that the events A;, A2,... are independent and chosen 
so that P(A,) < 1, for all n > 1. Prove that P{A, io.} = 1 if and only if 


P(U An) = 1. 
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Problem 2.10.33. Let X1, X2,... be any sequence of independent random vari- 
ables with P{X,, = 0} = 1/n and P{X, = 1} = 1 — 1/n. Set En = {X,, = 0}. By 
using the properties )°°° , P(E,) = œ, )-°2., P(E) = 00, conclude that lim, X, 
does not exist (P-a. e.). 


Problem 2.10.34. Let X1, X2,... be any sequence of random variables. Prove that 
X, “> 0 if and only if 


A 
[Xal 


——— — 0, forsomer > 0. 
1+ |X|’ 


In particular, if S, = X1 +---+ Xn, then 


Sn ia ES, P (Sn = ES,,)* 
n n? + (Sn = ES,)? 
Show also that, given any sequence of random variables X1, X2,..., one can claim 
that 
P Sn P 
max |X| > 0 => —-0. 
l<k<n n 


Problem 2.10.35. Let X1, X2,... be any sequence of independent and identically 
distributed Bernoulli random variables with P{X; = +1} = 1/2. Setting U, = 


Xej že, n > 1, prove that U, —> U (P-a. e.), where U is some random variable, 


which is distributed uniformly on [—1, +1]. 


Problem 2.10.36. (Egoroff’s Theorem.) Let (Q,.F, u) be any measurable space, 
endowed with a finite measure u, and let fi, f2,... be some sequence of Borel 


: : : : i H 
functions, which converges in measure u to the Borel function f, i.e., f, > f. 
Egoroff’s Theorem states that for every given € > O it is possible to find a set 


As € F, with u(A:) < £, such that f,(@) —> f(@) uniformly for all œ € As, 
where A, = (2 \ A, is the complement of A,. Prove this statement. 


Problem 2.10.37. (Luzin’s theorem.) Let (2, F,P) = ([a,b],4%,A), where A 
stands for the Lebesgue measure on [a, b] and F is the collection of all Lebesgue 
sets. Let f = f(x) be any finite #-measurable function. Prove Luzin’s Theorem: 
for every given £ > 0 one can find a continuous function fs = f(x), such that 


Pix € [a,b]: fŒ) F felx)} <e. 


Problem 2.10.38. The statement of Egoroff’s Theorem leads naturally to the notion 
of almost uniform convergence. We say that the sequence of functions fi, fo,... 
converges almost uniformly to the function f, if, for every e > 0, it is possible to 
find a set A, € F with (A,) < £, so that f,(@) > f(@) uniformly for all w € A, 
(notation: fa 3 f). 
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Prove that the almost uniform convergence fp = f Tmphes both convergence in 


measure ( f= f) and convergence almost surely ( h 2s TE 


Problem 2.10.39. Let X1, X2, ... be a sequence random variables and let {X;, 4} 
denote the set of those w € 2 for which X, (œw) does not converge as n —> oo. Prove 
that 
{Xr F} = (J flim inf x, < p <q < limsup Xn}, 
p<q 
where the union is taken over all pairs of rational numbers, (p,q), with p < q. 


Problem 2.10.40. Let X1, X2,... be any sequence of random variables defined 
on some complete probability space, which converges with probability 1 to the 
random variable X. Show that the o-algebras o (X1, X2,...) and o (X1, X2,..., X), 
generated, respectively, by the random elements (X1, X2,...) and (X1, X2,..., X) 
(see [P § 2.5]) coincide. 


Problem 2.10.41. Let X1, X2,... be any sequence of independent and identically 
distributed random variables, such that their distribution function F = F(x) 
satisfies the condition 


lim x’[1 — F(x)] = 
x=*00 
Prove that 


P 
re a as n—> oœ. 
Problem 2.10.42. Let &,&,... be any sequence of independent and identically 
distributed random variables with E£; = u, DE; = o? < œo and P{4 = 0} = 0. 
Prove that 
P k= ék 1y H 
dia & R +o? 


P P 
Problem 2.10.43. Suppose that &, —> €E, nı —> n and P{&, < n,} = l,n > 1. 
Prove that P{E < n} = 1. 


as n —> OQ. 


Problem 2.10.44. Let &, &,... be any sequence of non-negative random variables 
and suppose that the o-algebras F1, Fa,... are such that E(é, | Fn) 4, 0. Prove 
that £, > 0. 

Problem 2.10.45. Let &, 5 E and Cnn 4 E eS means convergence in 


distribution), where € is some non-degenerate random variable and c, > 0. Prove 
that c, —> 1. 


Problem 2.10.46. Let A;, Az,... be any sequence of random events. Setting A = 
lim, Án, prove that, if Sel ,P(A,) = œ, then the following relation, known as 
the Kochen—Stone inequality (see [64]), must hold: 


(Zh P(A). 
È= 1 l= 1 P(4k N A) ` 


P(A) > lim 
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Problem 2.10.47. Let &,&,... be any sequence of independent and identically 
distributed random variables with E|§,| < oo. Given some positive constant, a, set 
A, = {|E | > an},n > 1, and prove that P(lim A,,) = 0. 


Problem 2.10.48. Prove that in the space of continuous functions, C, there is no 
metric p for which the convergence p( fa, f) — O is equivalent to the point-wise 
the convergence fp —> f (comp. with Problem 2.10.12). 


Problem 2.10.49. Assuming that c > 0 is an arbitrary constant, give an example 
of a sequence of random variables £, &, &,..., such that Eé, = —c forall n > 1, 
E,(@) — &(@) in point-wise sense, and yet EE = c. 


Problem 2.10.50. For each of the three definitions in Problem 1.4.23 find the 
median un = u(n) of the random variables & = Isn<(—1)»—-1/n}, where n is a 
standard Gaussian random variable. 


Problem 2.10.51. Let un = u(n) be the uniquely defined medians (see Prob- 
lem 1.4.23) of the random variables &,, n > 1, which converge almost surely to 
a random variable €. Give an example showing that, in general, lim, (En) may 
not exist. 


Problem 2.10.52. Let &, &,... be any sequence of independent, non-negative and 
identically distributed non-degenerate random variables with Eé; = 1. Setting T, = 
r= éx n > 1, prove that T, — 0 (P-a. e.). 


Problem 2.10.53. Let £1, &2,... be any sequence of independent and identically 
distributed random variables and let S, = & + ++- + En, n > 1. Prove that: 


(a) EET =oo and E =o => Sn — +00 (P-a.e.); 
n 


max(|&;| prasy lénl) 
[Sn 


(b) El&|<oo and E& 40 => — 0 (P-a.e.); 
max(|&|,.--. |&/) 


Sa — 0 (P-a.e.). 


(c) E£? <œ => 


Problem 2.10.54. Let &, &,... be any sequence of random variables and let £ be a 
random variable. Prove that for every p > 1 the following conditions are equivalent: 


LPs 
(a) En => E (1.€., E\é, —&|P > 0); 
(b) &, 5 £ and the family {|&,|?, > 1} is uniformly integrable. 


Problem 2.10.55. Let &, &,... be some sequence of independent and identically 
distributed random variables, chosen so that P{§; > x} = e™, x > 0 (ie., each 
random variable is exponentially distributed). Prove that 


1, ifa <1, 


P{é, > alnn, i.o.} = 
0, ifa>l. 
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1, ifa<1l, 


P{é, > Inn +alnInn, i.o. } = 
0, ifa>1, 


and, in general, for every k > 1 one has 


1, ifa <1, 
Pfé, > Inn +InInn+...+In...Inn+alIn...Inni.o.} = Mites 
eee died ede 0, ifaw >1. 


Problem 2.10.56. Prove the following generalization of [P §2.10, Theorem 3], 
which is concerned with situations where convergence in L! comes as a conse- 
quence of convergence a. e.: if € is a random variable and (,)n>1 is some sequence 
of random variables, chosen so that that E|E| < oo, El&,| < oo, and &, — & (P- 
a.e.), then Eļé, — £| — 0 if and only if E|&,,| > E|&| as n —> oo. This statement is 
known as Scheffe’s lemma. (Comp. with the statement in Problem 2.6.19). 


Problem 2.10.57. Let &,&,... be any sequence of positive, independent and 
identically distributed random variables that share one and the same density f = 
f(x), with lim, jo f(x) = A > 0. Prove that 


d 
nmin(&,...,&:) > N, 


where 7 is an exponentially distributed random variable with parameter À. 


Problem 2.10.58. Prove that if one of the conditions (i), (ii), or (iii) in the 
assumptions of Problem 2.6.33 holds, then 


E|X,|? > E|X|?, forall0 < p<r. 


Problem 2.10.59. Let (&,)n>1 be any sequence of independent and normally 
distributed random variables, i.e., &) ~ WY (Ln, G,). Prove that the series ae : 


converges in L! if and only if 


YS u + o2) < 00. 


n>1 


Show also that when the above condition holds the series X ra E2 converges in L? 
forall p > 1. 
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Hint. To prove the second statement, one has to establish that 


[oe 


n>1 


<œ, forall p> 1. 
p 


Problem 2.10.60. Let X1, X2,... be independent random variables that are uni- 
formly distributed on the interval [0, 1]. Setting Y, = X,...X,, n > 1, consider the 
series $>; z"Y,, and prove that its radius of convergence, R = R(w), equals the 
constant e with probability 1. 

Hint. Use the relation 1/R = lim, |Y,|!/". 


Problem 2.10.61. Let (2, 4,P) = (0,1), A({0,1)),A), where à denotes the 
Lebesgue measure, and let w = (d),d2,...) be the continued fraction expansion 
of w € [0, 1) (in particular, an = an (œw) are integer numbers)—see [2]. Prove that 
asn —> oo one has 


1 
Mo: dy(w) = k} > — In| 


1+1/k | 
1+ 1/(kK+1)¥ 


Remark. Discussion of the origins of this problem and various approaches to its 
solution can be found in the “Essay on the history of probability theory” in the book 
“Probability-2” (see [121]) and on p. 101 in Arnold’s book [3]. 


Problem 2.10.62. Let X1, X2,... be independent and identically distributed ran- 
dom variables with P{X, = 0} = P{X,; = 2} = 1/2. Prove that 
(a) The series $S, ay converges almost surely to some random variable X. 
(b) The distribution function of the random variable X is the Cantor function 


(see [ P §2.3]). 


2.11 Hilbert Spaces of Random Variables with Finite 
Second Moments 


Problem 2.11.1. Prove that if £ = lim. &,, then ||&, || —> IEI. 


Problem 2.11.2. Prove that if € = Lim. &, and 7 = lim. nn, then (En, nn) > 
E, n). 


Problem 2.11.3. Prove that the norm || - || satisfies the “parallelogram law:” 


lE + nll? + lE = nll? = 2AE? + nlò. 
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Problem 2.11.4. Let {&,...,&} be any family of orthogonal random variables. 
Prove that 


n 2 n 
[doe] = done. 
i=l i=l 
This property is known as the “Pythagorean theorem.” 


Problem 2.11.5. Let &,&,... be any sequence of orthogonal random variables 
and let Sn = £1 +--+ £n. Prove that if X072, E£? < oo, then one can find a random 
variable S with ES? < oo, so that Lim. S, = S, i.e., ||S,—S||? = E| S, — S|? > 0 
asn —> oo. 


Hint. According to Problem 2.11.4 one must have 


n+k 


ISh = Sall? = XO Mnl? 


m=n+1 


Problem 2.11.6. Prove that Rademacher’s functions R, can be defined by the 
relation 
R,(x) = sign (sin 2"7x), O<x<1,n=1,2,... 


Problem 2.11.7. Prove, that for any £ € L?(Q,.¥%,P) and for any sub-o-algebra 
G C F one has 


Il = JEEP). 
with equality taking place if and only if £ = E(¢ | 2) (P-a. e.). 
Problem 2.11.8. Prove that if X,Y e L?(2,4%,P), E(X|Y) = Y and 


E(Y | X) = X, then X = Y (P-a.e.). In fact, the assumption X, Y € L?(Q,.¥,P) 
can be relaxed to X,Y e L!(Q,4,P), but under this weaker assumption the 
property X = Y (P-a. e.) is much harder to establish—see Problem 2.7.24. 


Problem 2.11.9. Suppose that ¥ is a o-algebra and that (Gg ), (GP ) and (go ) 
are three sequences of sub-o-algebras that are contained in F and are chosen so 
that 

GD cg cgo, foreveryn. 
Then suppose that £ is some .¥ -measurable and bounded random variable, for which 
one can find a random variable 7 with 


EE|Y) Sn and EE|Y®) Š n. 


P 
Prove, that when the above conditions hold one must also have E(é | gP) >n. 


Problem 2.11.10. Let x ~> f(x) be any Borel-measurable function, which is 
defined on [0, oo) and is such that Io e** f(x) dx = 0, for any A > 0. Prove 
that f = 0 almost surely relative to the Lebesgue measure on [0, oo). 
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Problem 2.11.11. Suppose that the random variable 7 is uniformly distributed in 
[—1, 1] and let £ = 7”. Prove that: 

(a) The optimal (in terms of the mean-square distance) estimate for E given 7, 
and for n given £, can be expressed, respectively, as 


E(E|n)=7° and E(n|&) = 0. 


(b) The respective optimal linear estimates can be expressed as 


E(E|n) =1/3 and E(y|£) =0. 


2.12 Characteristic Functions 


Problem 2.12.1. Let and 7 be two independent random variables and suppose that 


F(x) = fi) Hih), gŒ) = gŒ) + igx), where f(x), gx (x), k = 1,2, are 
Borel functions. Prove that if E| f(&)| < oo and E|g(&)| < oo, then 


E| f(E)g()| < œ 


and 


E f(€)g(n) = Ef (E) - Eg(n). 
(Recall that by definition Ef(€) = Efi(é) + IERE), ESOL = EU?) + 
J2) 


Problem 2.12.2. Let E£ = (&,...,§,) and EJE” < oo, where |El = VY &?. 
Prove that 


n tk 
g0 = D ECE + ene” 
k=0 ` 


where t = (t1,..., tn), (t, E) = H& +... + t,&, and £, (t) > O as ||t|| > 0. 
Hint. The proof should be analogous to the one in the one-dimensional case, 
after replacing t € with (t, €). 


Problem 2.12.3. Prove [P §2.12, Theorem 2] for n-dimensional distribution func- 
tions of the form F = F,,(x1,...,X,) and G = G,(X,...,%y). 


Problem 2.12.4. Let F = F(x,...,x,) be any multivariate distribution function 
and let ọ = @(t,...,t,) be the associated characteristic function. By using the 
notation from equation [ P §2.3, (12)], prove the multivariate conversion formula: 


et kak _ — 
P(a,b] = I an JE ST — olt... t) dt, ...dty. 
o0 (2r)" - k 
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(In the above formula it is assumed that the set (a,b], where a = (a1,..., án) 
and b = (b,,...,5,), is a continuity interval for the function P(a, b], in the sense 
that for all k = 1,...,n the marginal distribution functions F;,(x;), obtained from 
F(x,,...,Xn) by setting all arguments except x, to +00, are continuous at the 
points az, by.) 


Problem 2.12.5. Let p(t), k > 1, be any sequence of characteristic functions and 
let A, k > 1, be any sequence of non-negative numbers with X` A, = 1. Prove that 
t ~> >> Axge(t) must be a characteristic function. 


Problem 2.12.6. Assuming that g(t) is a characteristic function, is it true that 
Re g(t) and Im (t) are also characteristic functions? 

Hint. Let p = g(t) be the characteristic function for some distribution P. To 
answer the question regarding Re g(t), consider the distribution Q with Q(A) = 
S[P(A) + P(—A)], where —A = {—x : x € A}. To answer the question regarding 
Im g(t), consider the characteristic function g(t) = 1. 


Problem 2.12.7. Let ¢1, 92, %3 be any three characteristic functions with gig2 = 
¢1 93. Can one conclude that g2 = 3? 


Problem 2.12.8. Prove the formulas for the characteristic functions listed in 
[P §2.12, Tables 4 and 5]. 

Hint. The characteristic functions for the first five discrete distributions can be 
obtained with elmentary calculations. 


In the case of the negative binomial distribution (ciz iP tak k=rr+l,... 
andr = 1,2,...), notice that for |z| < 1 one has 


rol ok-r — ( =r 
Ez =(1-2)”. 


In the case of the characteristic function g(t), associated with the normal 
distribution “M (m, g’): notice that with m = 0 and o? = 1, according to the general 
theory of functions of complex variables, one must have 


Oe it)? 


<=. dx=e® | f@dz, 
L 


1 n = 12 
(t) = — | ee" = dx =e? 
is V20 JR V2 


where f(z) = ee, L = {z : Imz = 0}, and 


| ton- f| foiz- h=, 


where L’ = {z : Imz = t}. 
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The characteristic function of the gamma-distribution can be computed in a 
similar fashion. 

As for the characteristic function g(t), associated with the Cauchy distribution, 
notice that for t > 0 one has 


eo 
o= f t | fords. 


where L = {z : Imz = 0} and f(z) = ay. By the Cauchy’s residue theorem 
and the Jordan lemma (see [47, vol. 1]) one has 


J f@dz= 2ni res f =e 
L id 


Similarly, for £ < 0 one can prove that g(t) = e’, so that p(t) = e~*!"! for any 
real t. 


Problem 2.12.9. Let Ẹ be any integer-valued random variable and let g(t) be its 
characteristic function. Prove that 


1 Me ae 
Pí =k} = = eo (t)dt, k=0,+1,+2,.... 
T =y 
Problem 2.12.10. Consider the space L? = L?({—z, 7]), endowed with the Borel 


o-algebra 4|—z, 7] and the Lebesgue measure, and prove that the collection of 


functions i Te AN. n=0,+1,+2,... l forms an orthonormal basis in that space. 


Hint. Use the following steps: 
(a) For a given £ > 0 find a constant c > 0 such that 


lle — fll: < £, 


where f(x) = g(x)I(|p(x)| < ¢). 
(b) By using Lusin’s theorem (see Problem 2.10.37), find a continuous function 
F-(x) such that | fe (x)| < c and 


pix € [>x n]: f F f(x)} <e, 
so that || f — felln2 < 2e vE. 


(c) Find a continuous function F, (x) with the property Far) = J- (7) and 


lf — falz < €. 
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(d) By using the Weierstrass theorem find a function f(x) = k=n ake kx 


with the property = 
sup |fe(x) — f-(x)| < e, 
xeE[—2, 7] 
which implies Ife — fell < £. 
Conditions (a)-(d) above imply that the collection of all finite sums 
of the form Jz, axe'** is everywhere dense in L?, i.e., the system 


firein, n =0, +1, 2,... \ forms and orthonormal basis. 
Problem 2.12.11. In the statement of the Bochner—-Khinchin theorem it is assumed 
that the function under consideration, (t), is continuous. Prove the following result 
(due to F. Riesz), which shows to what extent it may be possible to remove the 
continuity assumption from the Bochner—Khinchin theorem. 

Let g = g(t) be any complex-valued and Borel-measurable function with the 
property g(0) = 1. Then one can claim that the function p = g(t) is positive 
definite if and only if it coincides with some characteristic function Lebesgue-almost 
everywhere on the real line. 


Problem 2.12.12. Which of the functions 


ot) =e", 0<aK<2, oA =e", a>2, 
o) = (1+ tD! ef) = 0+, 
1—= |z, |t| <1, 1= |f|, t| < 1/2, 
“= lel", lel a= |l lt] < 1/ 
0, l¢| > 1, 1/(4|¢)), |e) > 1/2, 


can be claimed to be a characteristic function? 

Hint. In order to demonstrate that some of the above functions are not char- 
acteristic, use [P §2.12, Theorem 1] and also the inequalities established in 
Problem 2.12.21 below. 


Problem 2.12.13. Prove that the function t ~ g(t), given by 


Vv1—??, |t| <1, 


(t) = 
ý 0, kl > 1, 


cannot be identified with the characteristic function of any random variable. 
Can one make the same claim about the function t ~ g(t) = mig 


Problem 2.12.14. Prove that if the function t ~> g(t) is a characteristic, then so is 
also the function t ~> |y(t)|?. 
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Problem 2.12.15. Prove that if the function £ ~> g(t) is characteristic, then so is 
also the function t ~> e+" for every à > 0. Can one claim that the function 


t ~> g(t) = eX") is characteristic? 


Problem 2.12.16. Prove that if £ ~> g(t) is a characteristic function, then the 
following functions must be characteristic, too: 


1 (e0) 
of (ut) du, of e “p(ut) du. 
0 0 
Problem 2.12.17. Prove that for every n > 1 the function 


eit => Gy fi 
(it)"/n! 


can be identified with the characteristic function of some random variable. 


Prt) = 


Problem 2.12.18. Let gy, (t) be the characteristic function of the random variable 
Xn, which is uniformly distributed in the interval (—n, n). Prove that 


, t=0, 
lim @x,(t) = 
noo 0, t Æ 0. 


Problem 2.12.19. Let (m™)y>1 be the sequence of all moments of the random 
variable X, which has distribution function F = F(x), i.e., m™® = f% x” dF (x). 


o om) 


Prove that if the series } „=; “r s” converges absolutely for some s > 0, then 


the sequence (m)),> uniquely defines the distribution function F = F(x). 


Problem 2.12.20. Let F = F(x) be any distribution function and let g(t) = 
JS e''* d F(x) be its characteristic function. Prove that 


1 f° _. 
lim =| et) dt = F(x) — F(x-) 


— 
coo = 


and 


im f wat = D [FG - FOP. 


c>oo 2¢ md 
x 


In particular, the distribution function F = F(x) can be claimed to be continuous 
if and only if its characteristic function g(t) satisfies the condition 


1 e 
lim zf |o(t)|? dt = 0. 
c> C Je 
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Problem 2.12.21. Prove that any characteristic function œ = y(t) must satisfy the 
following inequalities: 


1—Reg(nt) < n[1 — (Reg(t))"] < n7[1 — Reg(t)], n = 0,1,2,...; (*) 
lImp(*)i? < 5{1-Rey2Nk 1- Regl) > Regt)? 


lo(t) — g(s)? < 49(0)|1-—g(t—s)|; 1- RDP < 401 - |e): 
l(t) — v(s)|? < 2[1 — Re g(t —s)]; 


1 t+h 
af pu) du < (1+ Reg(h))?, t > 0. 
t—h 


(The last two relations are known as the Raikov inequalities.) 

Hint. The proof is based on the relation g(t) = [°° e!’*dF(x) (and the 
associated relations for Re y(t) and Im g(t)). Thus, for example, in order to prove 
the inequality 

1 — Re g(2t) < 4[1 — Re g(t)] (x) 


(a special case of (x) with n = 2) it is enough to notice that 
CO 
1—Reg(2t) = / (1—cos 2tx)dF(x) and 1 —cos 2tx < 4(1 —cos tx). 
—0o 


Problem 2.12.22. Suppose that the characteristic function ¢ = g(t) is such that 
g(t) = 1+ f(t) + o(t?) ast > 0, where f(t) = —f(—t). Prove that y(t) = 1. 
Hint. Use the relation (++) in the previous problem. 


Problem 2.12.23. Let ¢ = g(t) be the characteristic function of some random 
variable X, which has distribution function F = F(x). 


oo : : oo q= Re g(t) 
(a) Prove that f% |x| dF (x) < œ if and only if 7), — r; 


that these conditions imply 


elxi= f” marosi fT O u I ERO a, 
P : 


T Joo t? x t? 


dt < œ and 


(b) Prove that if JS |x| d F(x) < oo then one has 


(0,6) lo) T 
E|X| =f kiara -f Bee Oh a 
= mJ 


E t 


(See Problem 2.12.37.) 


2.12 Characteristic Functions 157 


Hint. (a) Use the following easy to check formula 


1 fe 1—- t 
gis f cos xX ii 


TE.) E t? 


(b) Use the fact that |x| = x sign x, where 
signx = 40, x =0, 


in conjunction with the relation 


. 1 (© sin xt 
signx = — dx. 
mJ t 


Problem 2.12.24. Consider a characteristic function of the form g(t) = 1 + 
O(|t|*) for t —> 0, where a € (0,2]. Prove that if € is a random variable with 
characteristic function g(t), then the following property must hold: 


P{l\E| > x} = O(x*) as x > 0. 


Problem 2.12.25. Let X and Y be any two independent and identically distributed 
random variables with vanishing means and standard deviations equal to 1. By using 
characteristic functions prove that if the distribution of the random variable (X + 
Y)/ ./2 coincides with the distribution of X and Y , then X and Y must be Gaussian. 


Problem 2.12.26. The Laplace Transform of a non-negative random variable X, 
with distribution function F = F (x), is defined (see Problem 2.6.32) as the function 
F = F(A), A > 0, given by 


F(A) = Ee™ = f e™ dF(x), forà > o0. 
[0,00) 


Prove the following criterion, which is due to S.N. Bernstein: the function 
F=F (A), defined on (0, œo), is the Laplace transform of some distribution function 
F = F(x) on (0, oo), if and only if F is completely monotone, in the sense that all 
derivatives F)(A), n > 0, exist and satisfy (—1)" F(A) > 0. 


Problem 2.12.27. Suppose that the distribution function F = F(x) admits density 
f = f(x), has characteristic function ¢ = g(t), and suppose that at least one of 
the following conditions holds: 


(a) a |y(t)| dt < œ or (b) T f? (x) dx < oo. 
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Prove Parseval’s formula: 


i i P(x) dx = / i kata o 


(Comp. with Parseval’s idenity—see [ P §2.11, (14)].) 


Problem 2.12.28. Prove that if the distribution function F = F(x) has density 
f = f(x), then its characteristic function ¢ = y(t) must be such that y(t) > 0 as 
t > œ. 


Problem 2.12.29. Let F = F(x) and F= F(x) be any two distribution functions 
on (R, A(R)) and let g(t) and Y(t) be their respective characteristic functions. 
Prove Parseval’s relation: for every t € R one has 


f TE E I evol) dF(y). (x) 


In particular, if F is the distribution function associated with the normal distribution 
law -V (0, 07), then 


f n: aee L -ive-z @(y) d (4) 
e 2 x)= e 7 e@ 207 ` xK 
—oo V 2002 —oo ie j 


(Comp. with the result in Problem 2.12.40.) 


Problem 2.12.30. By using (**) in the previous problem, conclude that if the 
distribution functions F and F> share the same characteristic function, then one 
must have Fı = Fz. (Comp. with the result in Problem 2.12.41.) 


Problem 2.12.31. By using Parseval’s relation (+) in Problem 2.12.29, prove the 
following result: if g(t) is the characteristic function of the random variable £, then 
the Laplace transform of the random variable |&| is given by the formula 


i À 
~All] — 
Ee Í G2 +P) y(t) dt, A> 0. 


(Comp. with the statement in Problem 2.12.23.) 


Problem 2.12.32. Let F = F(x) be any distribution function and let g(t) = 
fase e''* dF (x) be its characteristic function. According to [P §2.12, Theorem 

[0.0] : Ps 
3-b)], the property J |y(t)| dt < oo guarantees the existence of a continuous 
density f(x). Give an example of a distribution functions F = F(x) which admits 
a continuous density, and yet JSS lp(t)|dt = œ. 
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Problem 2.12.33. Let g(t) = fp ei" dF(x) be some characteristic function. 
According to [P §2.12, Theorem 1], if fg|x|dF(x) < oo, then g(t) must be 
differentiable. By using appropriate examples, prove that, in general, the converse 
statement does not hold. Prove that, in fact, it is possible to find a characteristic 
function y(t), which is infinitely differentiable, and yet fg |x| dF (x) = oo. 


Problem 2.12.34. (The “inversion formula.”) By using the argument in the proof 
of [P §2.12, Theorem 3], prove that, for any distribution function F = F(x) and 
anya < b, the following general “inversion formula” is in force: 


1 c p—ita _ p—itb 1 1 
lim == | oo di = FIFO) + FO) 5[F(@) + Fa- 


Problem 2.12.35. (a) Prove that the probability distribution with density 


— COS x 


ore , x ER, 


Xx? 
has characteristic function given by 


1—|¢|, |¢| <1, 


(t) = 
á 0, \t| > 1. 


(b) What is the characteristic function of the distribution with density 


q= 
mos T yek 
TEX 


(c) Prove that the characteristic functions of the probability densities 


1 
x) = —— and x) = ———, x eR, 
A) m cosh x fax) 2 cosh? x 
are given, respectively, by 
mt 
t) = and t) = ————, 
ail) cosh Smt Pall) 2 sinh irt 


where cosh y = (e” + e~”)/2 and sinh y = (e? + e™)/2. 
(d) Find the probability distributions associated with the following characteristic 
functions: 
1+ it 1—it t 2 


[oe faa Sy gee 


Tate eh nthe 
e` +-e". 
2 a 3 6 
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Problem 2.12.36. Let m = Sr x dF(x), k > 1, be the moments of the 
probability distribution F = F(x). Prove that 


ak 
— (2k) 
f coax) dF(x) = 2 Qk! me, 


Problem 2.12.37. Suppose that, just as in Problem 2.12.23, ọ = g(t) is the 
characteristic function of some random variable X, which has distribution function 
F = F(x). Prove that 


a |x|  dF(x)<oo for B € (0,2) 


if and only if 


dt <œ, 


®© 1—Reg(t) 
j|! +8 


—00 


in which case 


oe °° 1— Re g(t) 
E|x |? =| xlt dF) = Cp f mee ’ 


where 


-1 
o=| f ad STUFA gar. 


|t|!+A T 2 


Hint. Use the relation 


œ 1] — cos xt 
B 
x =C —— dt 
id J lei? 


Problem 2.12.38. Prove the statement in Problem 2.8.27 by calculating the charac- 


teristic functions of the random variables E, E. EL, and the characteristic function 


of the random variable C that has Cauchy distribution with density xeR. 


a ee 
m(1+x?)’ 
Problem 2.12.39. (Non-uniqueness in the problem of moments.) It was shown in 
[P §2.12, 9] that it is possible to find two different distribution functions that, 
nevertheless, have identical moments of all orders n > 1. Here is one such 
construction in terms of densities. 
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Let 7 be any standard normally distributed random variable (7 ~ “~ (0, 1)) and 
let € = e”. Prove that: 
(a) The density f(x) is given by the formula 


1 (In x)? 
K(x) = mea, x>0 


V20 


(comp. with [ P §2.8, (23)]). 
(b) The function 


g(x) = fe(x)[1 + sin(2z Inx)], x >0, 


is such that g(x) > 0 and f g(x) dx = 1. 
(c) For all n > 1 one has 


m x" fe(x) dx = EZS dx. 


Problem 2.12.40. Let & and 7 be two independent random variables, such that 7 
has standard normal distribution (i.e., n ~ ~ (0, 1)) and let f = f(x) be any 
bounded Borel function with compact support. Prove that for every o > 0 one has 


1 1 f° #2 ~ 

ES(§+on)= sf OPROF (%) 
o 27 J= 

where g(t) = Ee! and F(t) = JSS e"? f(x) dx. (Comp. with the result in 

Problem 2.12.29.) Formulate an analogous result for multivariate random variables 

E and 7. 


Problem 2.12.41. By using the relation (+) in the previous problem, prove that the 
characteristic function g(t) of any random variable £ completely determines the 
probability distribution of £. (Comp. with [ P §2.12, Theorem 2].) 

Hint. Convince yourself that, under the assumptions of the previous problem, 
the relation (+) implies that 


Lf? = ~ 
ESO = lim = fw eel Far, C) 
ooo 2m mes 
and conclude (using the fact the f = f(x) is an arbitrary bounded function with 
compact support), that the characteristic function g(t) indeed uniquely determines 
the distribution of the random variable £. Verify that the relation (**) holds also for 


multivariate random variables, € and obtain a multivariate analog of the relation (*). 


162 2 Mathematical Foundations of Probability Theory 


Problem 2.12.42. Let p = ¢(t) be a characteristic function and suppose that for 
some b > 0 and some 0 < a < 1 one has 


g(t) <a, forany |t|>b. 
Show the Cramér’s inequality: for any |t| < b one has 


t2 


le s 1-1-4) =. 


Hint. Use the inequality 1 — |g(2r)|? < 4(1 — |g(t)|?) from Problem 2.12.21. 


Problem 2.12.43. (Addendum to the inequalities in Problem 2.12.21.) Let F = 
F(x) be any distribution function and let g = g(t) be its characteristic function. 
Show the von Bahr—Esseen inequality: 


ll1—g(t)| < CB |t|", forevery 1<r <2, 


where 8”) = f% |x|’ dF(x) and C, is some constant. 


Problem 2.12.44. For integer numbers n > 1 the moments m) = EX” and the 
absolute moments 6,, = E|X |” of the random variable X can be expressed in terms 
of derivatives of order at most n of the characteristic function g(t) = Ee'*,t € R 
(see formula [ P §2.12, (13)] or (c) in Problem 2.12.23). In order to obtain similar 
representation for the moments m“ = EX® and By = E|X|® for arbitrary œ > 0, 
one must resort to fractional derivatives, as explained below. 

Let a = n + a, for some integer number n and some 0 < a < 1. The fractional 


de 
derivative D® f(t) (= ae J (t)) of the function f = f(t), t € R, is defined as 


the function 
a f FPA- fP) 
ds, 
Td-a) Jeo (t—s)!*¢ 


assuming that the integral in the above expression is well defined for any ¢ € R. In 
particular, if f(t) = g(t) (= J e''* d F(x)) is some characteristic function, then 


n+a Z 1 re f() = fu) 
DHO a= -FE |, du 


yita 


yeti oo 00 
= Sor f p x” (1 — cos ux) d F (x) (*) 


°° d 
+if x" sinus dF) ant: 
A u a 


co 
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Prove that for even numbers n the absolute moments „+a are finite, i.e., npa < 00, 
if and only if: 

(i) n < œ; 

(ii) Re[D"*4¢(t)|,—0] exists. 


Prove that when these conditions hold one must have 


1 
_ —_7yn/2 yw 
Pasa aa eos ar Re K 1) D y(t)| 9] ° 


Hint. Use (*) and the fact that for every 0 < b < 2 one has the following 


formula: Si b 
— cosu T 
J ap u= —T (—b) cos z` 


Remark. A detailed discussion of the calculation of E| X |" +4, for arbitrary n > 0 
and 0 < a < 1, can be found in the book [84]. 


Problem 2.12.45. Prove that the following inequality is in force for every charac- 
teristic function g = g(t) and every u and s: 


lou + s)| = lp: leol- E- leE = lee). 
Hint. Use Bochner—Khinchin’s theorem (see [ P §2.12, 6 ]). 


Problem 2.12.46. Suppose that (€,7) is a pair of random variables with joint 
density 


1 
f@y)= au + xy(x* = yM] < 1, ly] < 1). 


Prove that £ and 7 are two dependent random variables with densities 


1 1 
AO) =D HO) = 511 <D. 


Show also that the characteristic function, g¢+,(¢), of the sum € + 7, equals the 
product of the characteristic functions g¢(t) and g(t), i.e., Pe+y(t) = Ge(t) Pnt). 


Problem 2.12.47. Let &,&,... be a sequence of independent and identically 
distributed random variables that take the values 0,1,...,9 with probability 1/10 
and let 


Prove that the sequence (X,,),>1 converges not only in distribution, but also almost 
surely to a random variable that is uniformly distributed in the interval [0, 1]. 
Hint. Use the method of characteristic functions. 
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Problem 2.13.1. Prove that, given any Gaussian system of random variables, 
(&,1,.--,%n), the conditional expectation E(€ | n1, ..., nn) coincides with the 
conditional expectations in wide sense E(E | n1, ..., Nn). 

Problem 2.13.2. Let (£, n1, ..., ng) be a Gaussian system. Describe the structure 


of the conditional expectations E(&” | 71,..., ng), n = 1, as functions of the random 
variables 71,..., Ng. 


Problem 2.13.3. Let X = (Xk)i<k<n and Y = (Yk)ı<ķ< be two Gaussian random 
sequences with EX, = EY,, DX; = DY, 1 < k < n, and 
cov(X;, X7) < cov(Yk, Yi), 1<k,l <n. 


Prove the Slepyan’s inequality: for every x € R one has 


P| sup Xk <x} < PÍ sup Yk sak 


1<k<n l<k<n 


Problem 2.13.4. Let £, £2, £3 be three independent standard Gaussian random 
variables, i.e., & ~ ø” (0,1), i = 1,2, 3. Prove that 


El + §283 
Vise 


(This gives rise to the interesting problem of describing the family of all nonlinear 
transformations of a given family of independent Gaussian random variables, 
&,...,&),n > 2, that yield a Gaussian distribution.) 


~ N (0,1). 


Problem 2.13.5. In the context of [P §2.13], prove that the “matrix” R = 
(r(s,t))stex, associated with the function r(s,t) from [P §2.13, (25), (29) and 
(30)], is non-negative definite. 


Problem 2.13.6. Let A be any matrix of order m x n. We say that the matrix A®, 
of order n x m, is the pseudo-inverse of the matrix A, if one can find two matrices, 
U and V, such that 


AASA =A, AP = UA* = AV. 


Prove that the matrix A®, defined by the above conditions, exists and is unique. 


Problem 2.13.7. Prove that formulas (19) and (20) in the Theorem of the Normal 
Correlation (Theorem 2 in [P §2.13, 4]) remains valid in the case of a degenerate 
matrix Dg¢, provided that the inverse Dee is replaced by the pseudo-inverse DÈ. 
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Problem 2.13.8. Let (0,&) = (6),...,;&,...,&) be a Gaussian vector and 
suppose that the matrix A = Dg g — D Dž. is non-degenerate. Prove that the 
conditional distribution function P(@ < a|&) = P(O; < a,...,0% < alé) 
admits density given by (P-a. e.) 


Ar 


Plar,....d 15) = Opa 


1 
exp{—5 (a — EC |)" A7! (a — EO 1$}. 


Problem 2.13.9. Let £ and 7 be two independent standard Gaussian random vari- 
ables (i.e., Gaussian random variables with vanishing mean and standard deviation 
equal to 1). 

(a) Prove that the random variables € + ņ and & — n are also independent and 
Gaussian. 

(b) By using (a) and the result in Problem 2.8.27, prove that 


C law E+N law lg law 1+C law 1 
= =n Tee 1-C C 
where C is a random variable with Cauchy density ES (recall that «d» Stands 


for “equality in distribution”). 


Problem 2.13.10. (S. N. Bernstein.) Let E and 7 be any two independent and 
identically distributed random variables with finite variance. Prove that if E + n 
and £ — ņ are independent, then € and 7 must be Gaussian. (For a generalization of 
this result, see the Darmois—Skitovich Theorem stated in Problem 2.13.44.) 

Hint. Use the following line of reasoning (by @¢(t) we denote the characteristic 
function of the random variable ¢): 


(a) From g(t) = Pen (t) Pim (t) = @etn(t/2) + Pt- (t /2) conclude that 


nosal hl 
so- 


(b) By using (a) conclude that |g¢(t)| = |p, (t)| and that |g,(¢)| = ve (5) 

(c) By using (b) conclude that ¢, (1) Æ 0 for every t € R, so that one can define 
the function f(t) = In|g,(t)|, with f(t) = 4f(t/2), t € R. 

(d) From En? < oo conclude that g,(t) € C?(R) and by using (c) conclude that 


2 
, teER, 


and, analogously, 


2 
teR. 


| 4 


ro=f" (E)=-=F(Z)> 270. rer 


so that f”(t) = const. 
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(e) By using (d) conclude that f(t) = at? + bt + c, which, in conjunction with 
(c), gives f(t) = at?. 

(f) By using (e) conclude that 9, (t) = elit) tar? where the function a(t) should 
be continuous as long as ¢,,(¢) is continuous. 

(g) Convince yourself that a(t) has the property 


a(t) =24(5), te R. 


(h) By using the relation En” < œo conclude that 9, (t) is differentiable at 0 and 
by using (g) conclude that as k — oo one must have 


alt) _ a(t/2*) 


t tJ% 


+>a'(0), t#0, 


which shows that a(t) = œ’ (0)t. 


As a result, p(t) = ei% Otta? ie, n has Gaussian distribution. With a similar 


line of reasoning one can show that the random variable € is also Gaussian. 


Problem 2.13.11. (Mercer Theorem.) Let r = r(s,t) be any continuous covari- 
ance function defined on the rectangle [a, b] x [a,b], where -co < a < b < œœ. 
Prove that the equation 


b 
af r(s,t)u(t)dt =u(s), a<s<b 


admits a continuous solution, u(t), for infinitely many values A = A, > 0,k > 1, 
and the respective system of solutions {up = ux(s),k > 1} forms a complete 
orthonormal system in L*(a, b), such that 


5 Ux (s)uz (t) 


r(s,t) = Jk 


k=1 


where the series converges absolutely and uniformly on [a, b] x [a, b]. 


Problem 2.13.12. Let X = {X;,t > 0} be any Gaussian process with EX, = 0 and 
with covariance function r (s, t) = elt-sl st > 0. Given any 0 < ft) < --- < tn, let 
Jit, (X15 +++ +Xn) denote the (joint) density of the random variables X,,,..., Xa,- 
Prove, that this density admits the following representation: 


n 


Ft scooty (X15 0-25 Xn) = [ en)" gi i = etw) 


i=2 


x + ae yay 
apl- 1 — e2lti—ı—ti) i 


2.13 Gaussian Systems of Random Variables 167 


Problem 2.13.13. Let f = {ff = fa(u),n > lsu € [0,1]} be a complete 
orthonormal (for the Lebesgue measure on [0, 1]) system of L?-functions and let 
(En)n>1 be any sequence of independent and identically distributed ⁄ (0, 1)-random 
variables. Prove that the process B, = aren En h fau)du, 0 < t < lisa 
Brownian motion. 


Problem 2.13.14. Prove, that if B° = (B?)o<:<1 is a Brownian bridge process, 
then the process B = (B,);>0 given by B, = (1+ t) Bria+2) is a Brownian motion. 


Problem 2.13.15. Verify that if B = (B;)+>0 is a Brownian motion, then each of 
the following processes is also a Brownian motion: 


B® = —B,, t > 0; 
BO =tBy,,t>0, BO =0; 
B® = B4,—-B,, s>0,t>0; 


BY = Br—Br-, for0<t<T,T>0; 


II 


1 
B® —B,,, a>0,t>0 (scaling property). 
a 


Problem 2.13.16. Let BY = (B, + ut)t>o be a Brownian motion with drift. 


(a) Find the distribution of the random variables BE + BE , for ti < b. 


(b) Calculate EB; By, and EB;, BË BË, for to < ti < h. 


t29 


Problem 2.13.17. Consider the process B“ from the previous problem and calcu- 
late the conditional distributions 


P(B% €- |B), forti < bandt >t, 
and 
P(B} E: | BE, BE), forty <t <h. 


Problem 2.13.18. Let B = (B;):>0 be a Brownian motion process. Prove that the 
process Y = (Y;);er, given by Y, = e™ B,2, is an Ornstein—Uhlenbeck process, 
i.e., a Gauss—Markov process with EY, = 0 and EY,Y, = els, 


Problem 2.13.19. Let Y = (Y;);eR be an Ornstein—Uhlenbeck process. Prove that 


the process 
Bo = ViA =t) Yin, 0<t <1, 
0, t=0,1, 


is a Brownian bridge. 


Problem 2.13.20. Let £o, &, &,... be independent and identically distributed stan- 
dard Gaussian (i.e., ” (0, 1)) random variables. Prove that the series 
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oo : 
2sinkzt 
apar 0<t<1, 


defines a Brownian bridge, while, just as the series in [ P §2.13, (26)], the series 


J2sinkrt 
pea a Care i 
k=1 
defines a Brownian motion. 


Problem 2.13.21. Give a detailed proof of the fact that the processes (B;)o<r<1, 
defined in [ P §2.13, (26) and (28)], and the process 


[0,6] 
1 — cos nat 
B; = V2 Xe, gp - 


n=1 


where é, n > 1, are chosen as in [P §2.13, (26) and (28)], are all Brownian 
motions. 


Problem 2.13.22. Let X = (Xx)1<x<n be any Gaussian sequence, let 


m= max EX;, o° = max DX;, 
l<k<n l<k<n 


and suppose that 


P{ max (X — EX4) > a} < 1/2, forsomea. 


l<k<n 


Prove the following inequality, which is due to E. Borel: 


P| max X; > x} < w( 15, 
l<k<n o 
where W(x) = (Cay SS env? /2 dy. 


Problem 2.13.23. Let (X,Y) be any bi-variate Gaussian random variable 
with EX = EY = 0, EX? > 0, EY? > 0, and with correlation coefficient 
P= JEVE?" 

(a) Prove that the variables X and Z = (Y — pX )/ V1—p2 1 — p? are independent and 
normally distributed. 

(b) Prove that 


P{XY <0}=1-—2P{X >0,Y¥ > 0}= 27! arccos p, 
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and conclude that 


1 1 
P{X >0,Y >0}= P{X <0,Y <0} = at zy arsine, 
T 


1 
2n./1— p? 
1 


1 
P{X >0,Y <0} = P{X <0,Y > 0} = z 5, Co 
T 


? pry 5 0,¥ >0}= 
dp 


(c) Let Z = max(X, Y ), where EX? = EY? = 1. Prove that 


Ez=,/—", Ez?=1. 


(d) Prove that for arbitrary a and b one has the following inequalities: 


(A — ®(a))(1 — ®(c)) < P{X >a, Y >b} 


py(b)(1 — &(d)) 
g(a) 


where c = (b — ap)/ y1 — œ, d = (a — bp)/ y 1 — p? and g(x) = &'(x) is the 
standard normal density. 
Hint. Property (b) can be derived from property (a). 


Problem 2.13.24. Let Z = XY, where X ~ “~ (0,1) and P{Y = 1} = P{Y = 
-= h. Prove that Z ~ MN (0, 1), find the distribution of the pairs (X, Z) and 
(Y, Z), and find the distribution of the random variable X + Z. Convince yourself 
that X and Z are uncorrelated and yet dependent. 


< (1 — (a) — ()) + 


Problem 2.13.25. Let & be any standard normal random variable, i.e., E ~ 
/ (0, 1), and let 


EEE E 
“ l-g, ifll>a. 


Prove that ne ~ ~ (0, 1) and that with a chosen so that 


“2 _! =! 2 
| Ppooar =; (fw =e af 


the variables € and 71/4 are uncorrelated and yet dependent Gaussian random 
variables (comp. with [ P §2.13, Theorem 1-a)]). 
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Problem 2.13.26. Let £ and 7 be two normally distributed random variables with 
E£ = En = 0, E£? = En? = 1 and Eén = p. Prove that: 


(@) Emax(é,) = VU — p)/z: 
©) EGln)=pn, DE|) =1- p; 
© EG +n=)=2/2, D(E|E+ n= z) = (l — p)/2; 
(d) El(E+n]|é>0,n >0)=2y2/x. 
Give the analogs of the above formulas for the case where DE = o? and Dyn = 
o5, for arbitrary 0; > 0 and op > 0. 
Problem 2.13.27. Let (*) be any bi-variate Gaussian random variable with co- 


variance matrix 
yn) 
Oo” O 
cov(X, Y) = ( 2 . 
Oo” g 


(1)=2() 


where Q is an orthogonal matrix and £ and 7 are two independent Gaussian random 
variables. 


Write (z ) in the form 


Problem 2.13.28. Let € = (&,...,&,) be any non-degenerate Gaussian vector 
with vanishing mean and with covariance matrix R = ||E&&;||, and suppose that 
Ài,- .., Àn are the eigenvalues of the matrix R. Prove that the characteristic function, 
y(t), of the random variable £? + ... + &? coincides with the characteristic function 
of arandom variable of the form A, nit. . +A ne , where 71,..., Nn are independent 
standard Gaussian random variables (nx ~ -/ (0, 1)), and, furthermore, one has 


n 


g(t) = | [1 2i. 


j=l 


Problem 2.13.29. Let &,..., En, n > 2, be any set of independent and identically 
distributed random variables. Prove that the distribution of the vector (&,...,&,) 
is rotation invariant if and only if each of the variables &,...,&, is normally 
distributed with vanishing mean. 

Hint. Use characteristic functions. 
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Problem 2.13.30. (Statistics of the normal distribution N (m,o7): part I.) Sup- 
pose that £,...&,, n > 2, are independent and identically distributed normal, 
N (m,o7), random variables. Prove that the variables 


= LL 1 n = 
== d Daae _ 2 
22 and si rai 2G 3 


are independent and 
n=l 


(n-1)si = LE- m}. 


Hint. Use the statement in the previous problem. 


Problem 2.13.31. (Statistics of the normal distribution N (m, 0°): part II.) Let 


Ei ..., Én be any set of independent and identically distributed random variables 
with normal distribution “M (m, o°), and let x = (x1,...,X,) be some sample of 
observations over € = (&,...,&,),1 > 1. 


(a) Prove that the pairs of statistics 


T(x) = X zi; T(x) = Dg 


i=l i=l 
and 
=l E B 
x=—) X, s(x) = - Gi —x) 
ae Wei 


are sufficient. 
(b) Convince yourself that 


s’ (x) = 1a x — X. 


i=l 


Problem 2.13.32. (Statistics of the normal distribution N (m, o°): part HI—m 
is unknown and o? = og.) In this and the following problem it is assumed 
that &,...,§, is a set of independent and identically distributed “~ (n, o°)- 
random variables and the notation from Problem 2.13.30 (with n > 2) and from 
Problem 2.13.31 (with n > 1) is assumed. 

Suppose that m is unknown, o? is known to be o° = o¢. 

(a) Prove that, for E = + yi & (= +7; (8), one has 


E£ =m (unbiased estimate) and Dé = w, 
n 


172 2 Mathematical Foundations of Probability Theory 


(b) Prove that (for o? = = o$) the sample mean X is an effective estimate, i.e., un- 
biased estimate with minimal dispersion. For that purpose, prove that in this case the 
unbiased estimate, T (x), for the parameter m satisfies the Rao—Cramér’s inequality: 


1 1 
DT = (5 a) = ( n ) , 
n = 2 
om % 
where 
1 _ Gm)? 
Pinoa) = e 20% 
2109 
(c) Prove that the variable 
E-m 


has a standard normal, i.e., ⁄ (0, 1), distribution, and, furthermore, if À (£) is chosen 


so that 
A(e) 


Jon a 


where 0 < e < 1, then the interval 


— 9 =, 90 
(7¥-Prw.z+ rae) 


is a confidence interval for m with confidence level 1 — e, i.e., the “probability for 
cover,” satisfies 


j+e= etl dt (=26(A(e)) — 1), 


Pinee)} The) sm sé + Taol = 1-6 


where P,,,,2) Stands for the probability law with density Pon, o2): (Comp. with the 


(m,09) 


Definition in [ P §1.7, 2].) 


Problem 2.13.33. (Statistics of the normal distribution “N (m, 0°): part IV—m = 
mo, but o? is unknown.) 
If m E known (m = mo), then it is natural to estimate o? not by the variable 
s*(x) = + 7}_, (x; — Xx)’, but, rather, by the variable 


n 


1 
s(x) = 7 e — mo). 


i=l 
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(a) Prove that 


2 4 
Es(€) = 0° (unbiased estimate) and Dsê(£) = Z, 


(b) Prove that the sample dispersion So (x) (with m = mo) is an effective estimate 
of the variable o”, i.e., unbiased estimate with a minimal dispersion. To this end, 
prove that the Rao—Cramér inequality for the unbiased estimate T (x) of the variable 
o? has the form: 


1 1 
DT > = 


do? 


Remark. As for the accuracy of the estimate Soe) one can construct a confi- 
dence interval for o° by using the following considerations. 


Given x = (x1,..., Xn), let 


n 2 
E O Xj — Mo 
Go) = (AS), 


i=l 


Since 
d n 
LOSYEN ©&x), 
i=l 


according to [P §2.8, (34)], the variable y? (£) has x?-distribution with n degrees 
of freedom; more specifically, it has density (x > 0) 


18 = HPP aD) 


(see also [ P §2.3, Table 3]). Since, at the same time, one can write 


2 2 
Xn(x)o 
sox) = n i 


one must have 
2 x 
so(x)n — ) 
Pimad| oor <x =| fet) at. 
For this reason, given any 0 < € < 1, it is possible to find a A’ (£) and å” (e) so that 


A (£) e oo 


E 
fe(t)dt == and fet)dt= >. 
n 2 w(e) n 2 
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Consequently, 


A (€) 
| Jet) dt =1-e. 
À 


"Me 


Furthermore, the interval 


soon sé(x)n 
Cia ` M(e) ) 


is a confidence interval for o? with confidence level (1 — £), since 


2 2 
sgn 2 — n i 2 ” 
<o’< = {A (e) < x(x) < A" (e). 
E ses o| = WO SG) 51} 
Finally, we note that the choice of € > 0 does not determine uniquely a’ (£) and 
a" (e) from the relation 


a’ (e) 
f fet)dt =1-—e. 
a' (e) 

(c) How should one choose a'(e) and a” (e) in order to define the narrowest 
possible confidence interval for o? with confidence level (1 — £)? Are these values 
for a’(e) and a” (£) going to be the same as A’ (e) and A” (€)? 


Problem 2.13.34. (Statistics of the normal distribution N (m, o°): part V—m is 
unknown and o° is unknown). 

(a) Prove that in this case, for any n > 1, the unbiased estimates for m and o? 
are given by 


x= 


aR 


n—-1 n—-1¢ 
i=1 


oxi and s?(x) = É s(x) = : Yo — x). 
i=l 


(b) Prove that the statistics 


has Student distribution with n — 1 degrees of freedom—see [ P §2.3, Table 3]. 
Hint. Write the variables ¢,—;(x) in the form 


zm ~n 


(=>) 
o 


tn—1(X) = 
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and notice that: 

(i) The numerator in the last expression has standard normal, -/ (0, 1), 
distribution. 

(ii) The denominator ut) has the same distribution as the random variable 


d = : 
y zH X-i where x2 = Si n? and m,..., Nn—1 are independent standard 


normal, ⁄ (0, 1), random variables. 


(iii) The variables Im n and sO are independent. 


The desired statement with regard to the variables t„—ı (£) follows from (i), (ii), 
(iii), and the formula [ P §2.8, (38)]. E 
(c) By taking into account that the variable ¢,_;(x) = (=) has Student 


51 
Va 
distribution, construct confidence intervals for the parameter m with confidence 


level 1 — e. 

(d) Prove that the variable (n — 1) (2y has y?-distribution with (n — 1) degrees 
of freedom and, by using this property, construct a confidence interval for the 
parameter o with confidence level (1 — e). 


Problem 2.13.35. Suppose that (t) is the characteristic function from Prob- 
lem 2.13.28 and prove that for every choice of 0 < a; < ... < a, and px => 0, 
1 < k <n, with $`}; Pk = 1, the function 


Wo =D reft) 


k=1 
is characteristic. 


Problem 2.13.36. Consider the Gaussian sequences X = (X,)n>0, with covari- 
ance function of the form 

e Jl or mingi, j) (=27'(i]+|7/-li-J)), if =0,1,2,.... 
What structural properties (such as independent increments, stationarity, Markovian, 
etc.) does this sequence have? 


Problem 2.13.37. Let N be a standard Gaussian random variable (N ~ “~ (0, 1)). 
Prove that for any œ < 1 one has 


1 1 1 
“ive = gee! (372) 


Problem 2.13.38. Let X and Y be two independent standard normal (4 (0, 1)) 
random variables. Prove that 


1 
(arry) <™ 


if and only if p < 2. 
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Problem 2.13.39. Let everything be as in Problem 2.13.38 and suppose that 


X? 


1 2 2 


Prove that: 
(a) T and g are independent; 
(b) T has exponential distribution (P{T > t} = e™, t > 0); 
. EPER . . . 1 
(c) g has arcsin-distribution (with density PV x € (0, 1)). 


Problem 2.13.40. Let B = (B;);>o0 be a Brownian motion and let 
Ta = inf{t > 0: B, =a} 


be the first passage time to level a > 0, with the understanding that T, = oo, if the 
set in the right side of the last relation is empty. 

By using the reflection principle, i.e., the property P{sup,<, Bs > a} = 2P{B; = 
a} (see [17], [103]), prove that the density p,(t) = Tes) 
formula 


=) + > 0, is given by the 


a 2 
palt) = ae 1, 
V2nt3 
Hint. Use the fact that P{T, < t} = 2P{B, > a}. 


Problem 2.13.41. Let T = Tı, where T, is the first passage time defined in the 
previous problem. Prove that 

T € Nœ, 
where N is a standard normal (N ~ -/# (0, 1)) random variable. In addition, prove 
that the Laplace transform of T is given by 


while the Fourier transform of T is given by 


, sik: t 
Eel? — Belt xz = exp { - [1/'7(1 =H teR. 


(The above relations may be viewed as a constructive definition of the random 
variable 1/N?, which has a stable distribution with parameters œ = L, B = 0, 
0 = —1, and d = 1 (see [P §3.6, (9) and (10)]). 


2.13 Gaussian Systems of Random Variables 177 


Problem 2.13.42. Let X and Y be two independent normally distributed 
(WV (0, 07)) random variables. 
(a) Prove that the variables 


2XY d X =y 
/ X? + Y? / X2? + Y? 


are independent and normally distributed with mean 0 and dispersion 1/2. 
(b) Conclude that 


where C is a Cauchy random variable with density 1/ (r (1 + x?)), x € R, and that 


law 


Cai ag. 
C 


(c) Generalize this result by showing that for every a > 0 one has 


Ca w adat. 


X 
(d) Prove that the variables X? + Y? and SS tte independent. 


Hint. (a) Use the representation for the variables X and Y obtained in Prob- 
lem 2.8.13. 
(b) Use the result in Problem 2.8.27 (a). 


(c) For the proof it suffices to show that if f D = w then for 
bounded function g(x) the integrals J g(f œ) 7 z and pa v &(X) ——> 


1+ S 
coincide. 


Problem 2.13.43. Prove that for any 0 < H < 1 the function 
1 
R(s,t) = se +s” —|¢—sP"), 5,1 >0 


is non-negative definite (see formula [ P §2.13, (24)]) and that, therefore, one can 
construct a Gaussian process B” = (BE )r>o With mean 0 and covariance function 
R(s, t). (By using Kolmogorov’s criterion—see, for example, [17]—it is possible to 
show that, in fact, BY = (BF )+>0 may be chosen to have continuous sample paths. 
Such a process is commonly referred to as a fractal Brownian motion with Hurst 
parameter H.) 

Convince yourself that for H > 1 the function R(s,t) is not non-negative 
definite. 
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Problem 2.13.44. (The Darmois—Skitovich theorem.) Let £, . . . , & be independent 
and identically distributed random variables and let a,,...,a@, and bj,...,b, be 
some non-zero constants. Prove that the following characterization holds: if the 
random variables )~"_, a;& and }~’_, b;& are independent, then the variables 
&,,...,§, must have normal distribution. (With n = 2 and with a; = a, = 1, b} = 
1 and b2 = —1 this is nothing but the Bernstein theorem from Problem 2.13.10.) 


Problem 2.13.45. Let £1, &,... be any sequence of independent standard normal 
(W (0, 1)) random variables. Prove that as n — oo the random variables 


XiX Visi Xi 
n= Vn Se = 7S 
X= X (x71 x?) E 


converge in distribution to a standard normal (./ (0, 1)) random variable. 


Problem 2.13.46. Let (X, Y) be any pair of Gaussian random variables with EX = 
EY = 0, DX = DY = 1, and with correlation coefficient py y. Prove that the 
correlation coefficient pg:x),e(v) of the variables ®(X) and ®(Y ), where O(x) = 


(20) 1/2 ee ey? dy, is given by the formula 


pP s arcsin et, 
D(X), B(Y) = = 7 


Problem 2.13.47. Let (X, Y, Z) be any 3-dimensional Gaussian random vector 
with EX = EY = EZ = 0,DX = DY = DZ = 1 and with correlation 
coefficients p(X, Y) = pı, p(X, Z) = pm, p(Y, Z) = p3. Prove that (comp. with 
statement (b) in Problem 2.13.23) 


1 1 
P{X >0,Y >0,Z>0}= — gt gy assin pı + arcsin p2 + arcsin p3}. 


Hint. Let A = {X > 0}, B = {Y > 0}, C = {Z > 0}. Then, for p = 
P(A N BNC), by the “inclusion—exclusion formula” (Problem 1.1.12), one has 


1- p = P(A U BUC) = [P(A) + P(B) + P(C)] 
—[P(AN B)+ P(ANC)+P(BNC)| + p. 
Finally, use the result in Problem 2.13.23(b). 


Problem 2.13.48. Prove that the Laplace transform, Ee~* R? A > 0, of the square 
of the “span”, of the Brownian bridge B° = (B?)o<;<1, namely, the quantity 


PA 
R= 2 ( max B? — min Bẹ), 
mw \0<t<1 0<t<1 
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is given by the formula 
2 
Ee- R ( VAn ) 
sinh VÀ 


Problem 2.13.49. (O. V. Viskov.) Let ņn and ¢ be any two independent standard 
normal (⁄ (0, 1)) random variables. Prove that: 

(a) For any given function f = f(z), z € C, with E| f(x + (n + i¢))| < co, the 
following “averaging” property is in force 


F(x) =Ef@ + (n + id). 


(b) For any Hermite polynomial He,,(x), n > 0 (see p. 380 in the Appendix) the 
following representation is in force 


He, (x) = E(x + i¢)”. 


Chapter 3 

Topology and Convergence in Spaces 
of Probability Measures: The Central 
Limit Theorem 


3.1 Weak Convergence of Probability Measures 
and Distributions 


Problem 3.1.1. We say that the function F = F(x), defined on R”, is continuous 
at the point x € R” if, for every e > 0, one can find a ô > 0, such that 
|F (x) — F(y)| < e forall y € R” that satisfy 


x— ôe < y <x +ôe, 


where e = (1,...,1) € R”. The sequence of distribution functions (Fp)n>1 is 
said to converge in general to the distribution function F (notation: F, = F) if 
F(x) > F(x) asn —> œ, for any x € R” at which the function F = F(x) is 
continuous. 

Prove that the statement in [P §3.1, Theorem 2] also holds for the spaces R”, 
m > | (see Remark 1 after [ P §3.1, Theorem 1]). 

Hint. In the context of [P §3.1], it is enough to show only the equivalence 
(1) <= (4). To prove the implication (1) = (4), suppose that x e R” is a 
continuity point for F, and convince yourself that if d(—oo, x] is the boundary of 
the set (—oo, x] = (—0o, xı] x -++ x (—00, Xm], then P(d(—oo, x]) = 0, so that 
P,,((—oo, x]) — P((—oo, x]), i.e., F(x) — F(x). The proof of the implication 
(4) = (1) in the m-dimensional case is analogous to the one-dimensional argument 
in the proof of [ P §3.1, Theorem 2]. 


Problem 3.1.2. Prove that in the spaces R” the class of “elementary” sets, .%, is a 
convergence defining class. 


Problem 3.1.3. Let E be one of the space R”, C or D (see [ P §2.2]). The sequence 
of probability measures (P,,),>1 (defined on the Borel o-algebra, &, generated by 
the open sets in the respective space) converges in general, in the sense of finite- 


f 
dimensional distributions, to the probability measure P (notation: P, = P), if 
P,,(A) — P(A) as n > œ, for all cylindrical sets A with P(0A) = 0. 
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Prove that in the case of the space R° one has 


(P,>P) <> (P, >P). (x) 


Can one make the same statement for the spaces C and D? 
Hint. The implication <= in (+) is straight-forward. Therefore it is enough to 


f 
prove only that (P, = P) > (P, — P). Let f be any bounded (| f| < c) function 
from the space C(R®). Given any m € N = {1,2,...}, define the functions 
Fin: RO — R by 


Fil X15 26 es Mae MA ee) Fin(X1, +++, Xm, 9,0,...). 


Clearly, one has fm € C(R®), | fm| < c and fin(x) > f(x), for every x € R”. 
Next, consider the sets 


An = fx ER? : | fn(x) FO eh, 


and convince yourself that the following estimate holds for all sufficiently large n 


and m: 
f Fin AP, -f fdP, 
Roo Roo 


Then notice that [poo fin dPn —> fro Jm dP for every m and by using the above 
estimate prove that 


< eP,(Am) + 2cP(Am) <E +4 ce. 


<e+4ce, 


tim f fa,- f Jin dP 
n Re Re 


lim | faP, -f fy dP 
Re Re 


n 


<e+4ce. 


for all sufficiently large m. The Lebesgue dominated convergence theorem yields 
| poo fn AP > fpoo f dP, and the previous two inequalities yield: 


Tm f fa,- f fdP 
n Re RX 


lim fa,- f fdP 
RX 


n RX 


<e+4ce; 


<e+4ce. 


Since £ > 0 is arbitrarily chosen, it follows that 


fa, > | f dP, n>. 
Roo Roo 
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Problem 3.1.4. Let F and G be any two distribution functions on the real line 
and let 


L(F,G) = inf {h > 0: F(x —h) -h < G(x) < F(x + h) + h} 


be the Lévy distance between them. Prove that the convergence in the Lévy metric, 
L(+,- ), is equivalent to the convergence in general, i.e. 


(Fi > F) = (LU, F) > 0). 


Hint. The implication (L(F,, F) > 0) > (F, => F) follows directly from 
the definition. The inverse implication can be established by contradiction, i.e., 
by showing that F, = F, while, at the same time, L(F,,F) 4 0, leads to a 
contradiction. 


Problem 3.1.5. Suppose that F, = F and that the distribution function F is 
continuous. Prove that the functions F,,(x) converge uniformly to F(x) asn —> oo 
(comp. with Problem 1.6.8): 


sup |F,(x) — F(x)| > 0, n> oo. 


Hint. Choose an arbitrary € > 0 and let m > 1/e. Taking into account that F 


is continuous, choose the points x;,...,Xm—1 so that F(x;) = L and | F,(x;) — 
F(x;)| < €, i = 1,...,m — 1, for any sufficiently large n. Conclude that for any 
x € [xk, X41] (with the understanding that x9 = —oo and Xm = 00) one must have 


1 
Fy, (x) — F(x) < Fa(Xe4i) — F (xk) < F(xk+1) +e - F(x) = e+ a 2e. 


Analogously, F (x)— F, (x) < 2e and, therefore, | F,,(x)— F (x)| < 2e forall x € R. 


Problem 3.1.6. Prove the statement formulated in Remark 1 after Theorem 1 in 
[P §3.1]. 


Problem 3.1.7. Prove the equivalence of conditions (I*)—(IV*), formulated in 
Remark 2 after Theorem 1 in [P §3.1]. 


Problem 3.1.8. Prove that P, > P (5 stands for “weakly converges to”) if and 
only if every sub-sequence, (P, ), of the sequence (P, ) contains a sub-sub-sequence 
(P,,””) with the property P,,” “4 P. 

Hint. The necessity part is obvious. For the sufficiency part, it is enough to notice 


w 
that if P,P, then one can find: some continuous and bounded function f, some 
€ > 0, and some sub-sequence (n’), so that 


Lf sare f rar 


By using this property, one can show that the existence of a sub-sub-sequence 


> &. 


(n"”) C (n’) with P,» => P leads to a contradiction. 
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Problem 3.1.9. Give an example of probability measures P, Pa, n > 1, on 


(R, A(R)), such that P, u P, and, at the same time, it is not true that P, (B) > 
P(B) for all Borel sets B € A(R). 


Problem 3.1.10. Give an example of distribution functions F = F(x), F, = 
Fa(x), n > 1, such, that F, ay F, but sup, | F(x) — F(x)| Æ 0, n > œ. 


Problem 3.1.11. In many probability theory texts, the implication (4) => (3) in 
[P §3.1, Theorem 2], concerning the convergence of the distribution functions Fp, 
n > 1, to the distribution function F, is attributed to E. Helly and H. E. Bray. Prove 
one more time the following statements: 

(a) Helly—Bray Lemma. If F, = F (see Definition 1), then 


b b 
lim f ine = f g(x) dF (x), 


a 


where a and b are any two continuity points for the distribution function F = F(x), 
and g = g(x) is any continuous function on the interval [a, b]. 
(b) Helly—Bray Theorem. If F, = F, then 


lim f dhe = f g(x) dF (x), 


for any bounded and continuous function g = g(x) defined on the real line R. 


Problem 3.1.12. Suppose that F, = F and that for some b > 0 the sequence 
( f |x|? dF, w) happens to be bounded. Prove that: 
n>1 


tim f |x|" dF, (x) = J |x| dF(x), 0<a< b; 


tim f x dF,(x) = ie dF(x) foreveryk = 1,2,...,[b],k 4b. 


Problem 3.1.13. Let F, = F and let y = med(F) and yw, = med(F,,) denote, 
respectively, the medians of the distributions F and F,„, n > 1 (see Problem 1.4.5). 
Assuming that the medians u and un are uniquely defined for all n > 1, prove 
that u, > u. 


Problem 3.1.14. Suppose that the distribution function F is uniquely determined 
by its moments m®) = [°° xt dF(x),k = 1,2,..., and let (F,)a>1 be any 
sequence of distribution functions, such that 


CO CO 
m® = f x* dF, (x) > m” al x* dF(x), k=l; 2a 
—_ 00 


Prove that F, > F. 
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Problem 3.1.15. Let u be any o-finite measure on the Borel o-algebra, &, for some 
metric space (E, p). Prove that for every B € & one has 


u(B) = sup{u(F); F C B, F is closed} = inf{u(G); G 2 B, G is open)}. 


Problem 3.1.16. Prove that a sequence of distribution functions, Fa, n > 1, defined 


on the real line R, converges weakly to the distribution functions F (Fp = F ) 
if and only if there is a set D which is everywhere dense in R and is such that 
Fa (x) > F(x) for every x € D. 


Problem 3.1.17. Suppose that the functions g(x) and (gn(x))n>1, x € R, are 
continuous and have the properties: 


sup | gn (x)| < € < 00; 
lim sup |g; (x) — g (x)| = 0, 
n eB 


for every bounded interval B = [a, b]. 
Prove that the convergence of distribution functions F, = F implies 


lim [ en(x) dF, (x) = [ g(x) dF(x). 


By constructing appropriate examples, prove that, in general, the point-wise conver- 
gence g,(x) > g(x), x € R, is not enough to guarantee the above convergence. 


Problem 3.1.18. Suppose that the following convergence of distribution functions 
takes place: F, > F asn — oo. 
(a) By constructing appropriate examples, prove that, in general, 


[sare f xaFeo. 


(b) Prove that if sup, Sfo |x| dF, < c < o, for some k > 1, then for all 
1 <7 < k-— 1 one must have 


he dr) > fx! dF (x). 
R R 


Problem 3.1.19. As a generalization of the previous problem, prove that if f = 
f(x) is some continuous function, not necessarily bounded, but such that 


km LO — 
|x|—>00 g(x) 
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for some positive function g = g(x) with sup, fg g(x) dF,(x) < € < oo, then 


f fe) dF,(x) > / fx) dF(x). 
R R 


3.2 Relative Compactness and Tightness of Families 
of Probability Distributions 


Problem 3.2.1. Prove Theorems 1 and 2 from [P §3.2] for the spaces R”, n > 2. 


Problem 3.2.2. Let Py be a Gaussian measure on the real line, with parameters Mmo 
and 02, for every œ € 2. Prove that the family A = {Py;@ € A} is tight if and 
only if there are constants, a and b, for which one can write 


[mal] <a, oo <b, œ cA. 


Hint. The sufficiency statement follows from the fact that for every a € X one 
can find a random variable ny ~ V (0, 1), such that & = me + Oana. With this 
observation in mind, one can conclude that P{|&| > n} < P{|nu| < 4}. Asa 
result, the family {P,} must be tight. The necessity statement can be established by 
contradiction. 


Problem 3.2.3. Give examples of tight and non-tight families of probability mea- 
sures Y = {P,;a@ € A}, defined on the measure space (R°, A(R™)). 

Hint. Consider the following families of measures: 

(a) {Pa}, where P, = P is such that 


1, if (0,0,...)€A, 


P(A) = 
0, if (0,0,...) Z A; 
(b) {P,,2 € N}, where P, is a probability measure concentrated at the point 
Xn = (n,0,0,...). 


Problem 3.2.4. Let P be a probability measure, defined on the Borel o-algebra, &, 
in some metric space (E, p). We say that the measure P is tight (comp. with [ P §3.2, 
Definition 2]), if for any € > O one can find a compact set K C E, such that 
P(K) > 1 — e. Prove the following result, known as “Ulam theorem”: every 
probability measure P, defined on the Borel o-algebra in some Polish space (i.e., 
some complete and separable metric space) is automatically tight. 


Problem 3.2.5. Suppose that X = {X, € R?;œ € A} is some family of random 
vectors in R4, chosen so that sup E]|X,||" < oo for some r > 0. Setting Py = 


Law(X,,), a € 2, show that family Y = { Py; a E A} is tight. 
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Problem 3.2.6. The family of random vectors {& € R”; t € T} is said to be tight, if 
lim sup P{||& || > a} = 0. 
a> tel 
(a) Prove that {& € R”; k > 0} is tight if and only if 
lim lim Pf£|&|| > a} = 0. 
a>W k->oo 


(b) Prove that the family of non-negative random variables {&;k > 0} is tight if 
and only if 


lim lim[1 — Ee~*5*] = 0. 
ALO k 


Problem 3.2.7. Let (&),>+ 9 be any sequence of random vectors in R”, and suppose 


d 
that & —> &, i.e., the distributions F% of the vectors & converge weakly (equiva- 
lently, converge essentially) to the distribution F of some random vector £. Prove 
that family {&;k > 0} is tight. 


Problem 3.2.8. Let (&)x>0 be any tight sequence of random variables and suppose 
P 
that the sequence (nk)k>o is such that nx — 0 as k — oo. Conclude from these 
P 
conditions that &, —> 0 as k —> oo. 


Problem 3.2.9. Let X1, X2,... be any infinite sequence of exchangeable random 
variables (for a definition, see Problem 2.5.4) and suppose that the variables X; take 
only the values 0 or 1. 

Prove the following result: there is a probability distribution function G = G(A) 
on the interval [0, 1], such that, for every 0 < k <n, and every n > 1, one has 


1 
P{X, = 1,..., Xe = 1, Xk41 =0,...,X, = 0} = f a — 2)" dGQ). 
0 


(This is a special case of B. de Finetti’s Theorem, according to which the distribution 
law of every infinite sequence of exchangeable random variables can be identified 
with the distribution law of a (convex) mixture of infinite sequences of independent 
and identically distributed random variables—see [1] and [29].) 

Hint. Consider the event 


Ay = {X, =1,..., Xk = l, Xk41 = 0,..., Xn = 0} 


and write the probability P(A;) in the form 


m 


P(Ay) = J P(Ag | Sz = J)P{Sm = 7}, (*) 


J=0 


188 3 Topology and Convergence in Spaces of Probability Measures... 
where m > n and Sm = )~"_, X;. Next, by using the exchangeability property, 
prove that the right side of (x) may be re-written as 


n—k—1 


k-1 

1 
E[[om¥n i) x [| a- m- x = — 
ee! A 


where Y,, = S,,/m. (Notice that for large m this expression is close to Eva — 
Y,,)"~*].) Finally, pass to the limit as m —> oo and conclude that the limit can be 
expressed as h AKL — Ay"-* dG(A), where G(A) is some distribution function on 
the interval [0, 1]. 


Problem 3.2.10. Let &,..., &, be any sequence of exchangeable random variables, 
which take the values 0 and 1. Prove that: 


Sn 
(a) P(& = 1] S,) = ao where S, = & +... + En; 


Sn(S;— 1 X ; 
(b) P(é; = 1, &; =1|S,) = SaS peni x j. 
n(n — 1) 
Problem 3.2.11. As a generalization of [P §1.11, Theorem 2], prove that if 
N1,- --, Nn is some set of exchangeable random variables with values in {0, 1, 2,...}, 


and if Sk =m +...+,1<k <n, then 


+ 
POS, < E forat =k sals = (1-5) ; 
n 


3.3 The Method of Characteristic Functions for Establishing 
Limit Theorems 


Problem 3.3.1. Prove the statement in [P §3.3, Theorem 1] in the case of the 
spaces R”, n > 2. 

Hint. The proof is analogous to the one-dimensional case, except for [P §3.3, 
Lemma 3]. The multidimensional analog of this lemma can be stated in the form: 


k 
J ares f a -Repar 
A a” B 
where 
A= fxe R”: pals i.. [Xal < 1}, 


B= {reR":0<0 <a,...,0<% <a}. 
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Problem 3.3.2. (The law of large numbers.) 


(a) Let &,&,... be any sequence of independent random variables with finite 
expected values E|é,,| and dispersions Dé, < K, n > 1. Prove that the law of large 
numbers holds: for every £ > 0 


Git. tén EGit...+&n) 
n n 


P 


TE as n > OO. (*) 


(b) Let &, &,... be any sequence of random variables with finite expected values 
E|é,|, dispersions Dé, < K, n > 1, and covariances cov(&,&;) < 0, i 4 j. Prove 
that the law of large numbers (*) holds. 

Hint. To prove (a) and (b), use Chebyshev’s inequality. 


(c) (S. N. Bernstein.) Let £1, &,... be any sequence of random variables with 
finite expected values E|£,,| and dispersions DE, < K, n > 1, and suppose that the 
covariances are such that cov(&,&;) — 0 as |i — j| —> oo. Prove that when these 
conditions are satisfied the law of large numbers (*) holds. 


Hint. Convince yourself that under the specified conditions one has 
Dé, +... + &)/n > 0 as noo. 


(d) Let &, &,... be independent and identically distributed random variables, 
let yn = Eé 7(\£1| < n)], and suppose that 


lim xP{|&| > x} = 0. 


Prove the following version of the law of large numbers: 


Sn P 
— — Uun > 0, 
n 


where, as usual, S, = €& +... + En. (See also Problem 3.3.20.) 
Hint. Given some s > 0, set gO = §& 1(\&;| < s) and mo = Ele” +...+ g, 
and prove that 


Pligt... + ën = mOl > e < D (EP +... +6) 


+PLE; + te + En # ae + sae + got. 
By using this estimate, convince yourself (setting s = n and t = en, € > 0) that 


pee 


n 


= Eg I(|&| < n)| > e} 


2 1 
2s xP{|&| > x}dx + nP{|&| > n}, 
En Jo 


which leads to the desired property. 
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Problem 3.3.3. In the setting of Theorem 1, prove that the family {n,n > 1} is 
uniformly equicontinuous and the convergence g, — g is uniform on every finite 
interval. 

Hint. The uniform equicontinuity of the family {g,,, > 1} means that for every 
€ > Oonecan finda ô > 0, such that, for every n > 1 and every s, t with |t—s| < ô, 
one has ln(t) = Pn (s)| <E. 

Assuming that F, => F , the Prokhorov Theorem (see [P §3.2, Theorem 1]) 
implies that, given any € > 0, one can find some a > 0 so that Six dF, < €, 
n > 1. Consequently, 


[za 


lon(t +h) —@n()| < f le — 1| dF, + 2e, 


|x|<a 
from where the desired uniform equicontinuity property easily follows. By using 


this property one can prove that 


sup |9,(t) —g(t)| ~0 as n—> o, 
t€[a,b] 


for every finite interval [a, b]. 


Problem 3.3.4. Let §,, n > 1, be any sequence of random variables with 


characteristic functions øg, (t), n = 1. Prove that £, EA 0 if and only if gg, (t) > 1 
as n — oo, in some neighborhood of the point t = 0. 

Hint. For the proof of the sufficiency part, consider using Lemma 3, according 
to which the family of measures {Law(&,,),n > 1} is tight. 


Problem 3.3.5. Let X1, X2,... be independent and identically distributed random 
vectors in R* with vanishing mean and with (finite) covariance matrix I”. Prove that 


Poe ee a 
AT An cm N (0, T). 
Jn 


(Comp. with Theorem 3.) 
Hint. According to Problem 3.3.1, it is enough to prove that, for every t € R*, 
one has 


Eein) > Egil) as n > oo, 
where &, = n™!/? (X; +-+- + Xn) and £ ~ N~ (0, T). 
Problem 3.3.6. Let &, &,... and 71,72,... be two sequences of random vari- 
ables, chosen so that &, and 7, are independent for every n, and suppose that £n 4 E 


d 
and 7, —> n as n — oo, where & and 7 are also independent. 
(a) Prove that the sequence of bi-variate random variables (&,,, nn) converges in 
distribution to (&, n). 
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(b) Let f = f(x,y) be any continuous function. Verify that the sequence 
f (En, Mn) converges in distribution to f(E, n). 


d 
Hint. The convergence (En, nn) —> (&,7) obtains from the statement in 


d 
Problem 3.3.1. In order to establish the convergence f(&,, nn) —> f(E, n), consider 
the composition g o f: R? — R, where ø: R > R is some continuous and bounded 
function. 


Problem 3.3.7. By constructing an appropriate example, prove that in part (2) of 
[P §3.3, Theorem 1] the continuity condition at 0 for the limiting characteristic 
function g(t) = lim, a(t) cannot be weakened in general. (In other words, if y(t) 
is not continuous at 0, then it is possible that g, (t) —> g(t), but there is no function 
F for which F, SF .) Convince yourself by way of example that if the continuity 
at O for the limiting function g(t) fails, then the family of probability distributions 
{P,,n > 1}, with characteristic functions ø, (t), n > 1, may no longer be tight. 

Hint. Take F, to be the distribution function of a Gaussian random variable with 
mean 0 and dispersion n. 


Problem 3.3.8. As an extension to inequality [P §3.3, (4)] from [P §3.3, 
Lemma 3], prove that if £ is a random variable with characteristic function g(t), 
then: 

(a) For any a > 0 one has 


2 
Pilsa is ah lo(t)| dt. 


(b) For any positive b and 6 one has 


20 


(i+ %) f’ 
P{|é| > b} < Ef [1 —Reg(t)] dt. 


(c) If £ is a non-negative random variable and y(a) = Ee“, a > 0, is its 
Laplace transform, then 


P{E > a™'} < 2(1 — y (a)). 


Problem 3.3.9. Suppose that &,&,&,... is some sequence of random vectors 


; d : : 
in R”. Prove that & —> & as k — oo if and only if for any vector £ € R” one 
has the following convergence of the respective scalar products 


EES 


(This result is the basis for the Cramér-Wold method, which comes down to 
replacing the test for convergence in distribution of random vectors from R” to the 
test for convergence in distribution of certain scalar random variables.) 
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Problem 3.3.10. As a continuation to Theorem 2, which is known as Khinchin law 
of large numbers (or Khinchin criterion), prove the following statement. 

Let &,&,... be some sequence of independent and identically distributed 
random variables and let S, = E£ + ... + En, 0 < p < 2. Then there is a constant, 
c € R, for which 


nP S, = C, 
if and only if one can claim that as r — 0: 
(a) r?P{|&,| > r} > Oandc = 0, if p < 1; 
(b) rP{|&i| > r} > 0 and E[E, 7(/&1| < r)] > c, if p = 1; 
(c) r?P{|&| > r} > Oand E£; = c = 0, if p > 1. 
Problem 3.3.11. Let &, &,... be a sequence of independent and identically dis- 


tributed random variables and let S, = & +... + §,. Prove that the variables 
n—'/2§, converge in probability as n — oo if and only if P{é; = 0} = 1. 


Problem 3.3.12. Let F(x) and (F,,(x))n>1 be some distribution functions and let 
g(t) and (Pn (t))n>1 be their respective characteristic functions. Prove that if 


sup lon (t) = g(t)| =e 0, 
t 


then 


sup |F, (x) — F(x)| > 0. 


Problem 3.3.13. Let &, &,... be independent and identically distributed random 

variables with distribution function F = F(x) and let S, = & +--+ én n > 1. 
Prove the following version of the law of large numbers (due to A. N. Kol- 

mogorov): for the existence of a sequence of numbers (an )n>1, such that 


S P 
-ar> 0 an>, (*) 
n 


it is necessary and sufficient that 
nP{j&|>n}>0 anoow, (>) 


or, equivalently, that 
x[l — F(x) — F(—x)] 70 as xo. 


Furthermore, when these conditions hold one has a, — E(&/(|&| < n)) > 0 as 
n — oo. (The existence of a sequence (an)n>1 for which the property (*) holds is 


known as “stability of the sequence (%) in the sense of Kolmogorov’.) 


Problem 3.3.14. In the context of the previous problem, prove that if E|E;| < oo 
then the condition (**) holds and it is possible to take a, = m, where m = Eé}. 
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(Comp. with [ P §3.3, Theorem 2], Khinchin criterion for the law of large numbers, 
and Problems 3.3.10 and 3.3.13.) 


Problem 3.3.15. Let &,&,... be any sequence of independent and identically 
distributed random variables that take values +3, +4, ... with probabilities 


c 
P{& = +x} = ———— e O 
E h 2x2 lnx’ ae 


where the normalizing constant c is given by 


= (Sas) 


x=3 


Prove that in this case E|&| = oo, but condition (**) from Problem 3.3.13 holds 


i ; i ; ; P 
and it is possible to take a, = 0, i.e., with this choice one has Sa —> 0. 


Remark. As the random variables &), &,... do not possess finite first moments 
(E|E;| = oo), it is not possible to formulate the law of large numbers in the sense of 
Khinchin (n7! S, —> m, where m = E&,—-see [P §3.3, Theorem 2]). Nevertheless 
the random variables £, &,... exhibit stability in the sense of Kolmogorov (see 
Problem 3.3.13), in that 

Sn P 

— >m (=0), 

n 
where m = Eg is the generalized expected value, which was defined by 
A. N. Kolmogorov (see [66, Chap. VI., §4]) by the formula 


EG = lim E((lé1] < n)$ı). 


Later A. N. Kolmogorov called this generalized expected value the “A-integraľ”. (It 
is common in analysis to say that the function f = f(x), x € R, is A-integrable, if: 

(i) f belongs to the space L! in weak sense (i.e., lim, nA{x : | f(x)| > n} > 0); 
and 

(ii) The limit lim, See: 
measure on (R, A(R)). 
Usually this integral is denoted by (A) f f(x) A(dx). One must be aware that 
many of the usual properties of the Lebesgue integral—the additivity property, for 
example—may not hold for the A-integral.) 


| f(c)l<n} f(x) A(dx) exists, where 4 is the Lebesgue 


Problem 3.3.16. Let &, &,... be a sequence of independent random variables 
(with finite expected values), such that 


n 
oD |’? > 0 assnh-om, 


i=l 
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for some 6 € (0,1). Prove that this “(1 + 6)-condition” guarantees that the law of 
large numbers is in force, i.e., 


Iy-e as n> œ. 
n 


i=l 


Problem 3.3.17. (Restatement of [P_§3.3, Theorem 3] for the case of non- 
identically distributed random variables.) By using the continuity theorem ([ P §3.3, 
Theorem 1]) and the method of characteristic functions, the central limit theorem 
was established in Theorem 3 in the case of independent and identically distributed 
random variables. By using the same method, prove the central limit theorem for 
the case of independent but not necessarily identically distributed random variables 
by using the following scheme. 

Let Aj, A2, ... be a sequence independent events, chosen so that P(A,) = 1/n 
(for examples of such events, see Problem 2.4.21). Setting € = I4, and S, = 
Ei +... + &, prove that 


1 
Es =>) 7 (~Inn as n—> oœ), 


k<n 


Ds, = (1-7) (~Inn as n—>ov). 


Next, consider the characteristic functions g,(t) of the random variables — P 


n > 1, and prove that g,(t) > en /2, Finally, conclude that the central limit 
theorem ([ P §3.3, Theorem 1]) holds: as n — oo one has 


n=E n 
ofS S 


< x} > (x), xeR. 
DS, 


Problem 3.3.18. As a supplement to inequality [P_§3.3, (4)] from [P §3.3, 
Lemma 3], show that the following double-sided inequality holds for any a > 0: 


(—sin1) dF (x) < fu ~Reg(t)|dt < 2 dF (x) + =. 


lx|>1/a x|>/T/a 
Hint. To prove the right inequality, write 1 — Re g(t) in the form 
1—Reg(t) = l (1 —costx) dF(x) = 
R 
=| (1 —costx) dF) + f (1 —costx) dF(x), 
Ix|=/1/a Ix]</T/a 


estimate the above integrals in the obvious way, and, just as in Lemma 3, use 
Fubini’s theorem. 
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Problem 3.3.19. Let (&,),+1 be a sequence of independent random variables, 
distributed according to the Cauchy law with density 


oy 6>0, x eR. 

(02 + x?) 
Prove that the distributions F, of the random variables 1 MaX; <n & converge weakly 
to the Fréchet distribution with parameter œ = 1 (see Problem 2.8.48), i.e., the 
distribution of a random variable of the form 1/T,, where T, has exponential 
distribution with parameter c = 0/7: 


1 
P{— <x} =e/* x>0. 
T; 


Problem 3.3.20. (Continuity theorem for discrete random variables.) Let 
E, E1, &,... be a sequence of random variables taking integer values k = 0,1,2,... 
and let 


G(s) = 5 P{fé=k}s* and G,(s) = 5 Pié, = k}s* 


k=0 k=0 


be the generating functions, respectively, of the variables £ and é, n > 1. 
Prove that 


lim P{é, =k} =P{E =k}, k=0,1,2,..., 


if and only if 
lim G, (s) = G(s), s € [0, 1). 


Problem 3.3.21. Prove the statement in Problem 2.10.35 by using the method of 
characteristic functions. 
Hint. The characteristic function of the random variable U, which is uniformly 


distributed in the interval [—1, 1], is the function sat 


3.4 The Central Limit Theorem for Sums of Independent 
Random Variables I. Lindeberg’s Condition 


Problem 3.4.1. Let &,&,... be a sequence of independent and identically dis- 
tributed random variables with E£? < oo. Prove that (comp. with Problem 2.10.53) 


max(|&|,...,|E:|) ad 


or ->0 an>. 
n 
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Hint. Use the relation 


p mael <el = [P {f < ne’ }]" 


and the fact that ne*P{&, > ne} > 0 as n > œ. 


Problem 3.4.2. Give a direct proof of the fact that in the Bernoulli scheme the 


variable sup, | Fr, (x) — (x)| has order Fi asn — oo. 


Problem 3.4.3. Let X1, X2,... be any infinite sequence of exchangeable random 
variables (see Problem 2.5.4) with EX, = 0, EX? = 1,n > 1, and let 


cov(X;, X2) = cov(X?, X2). (x) 
Prove that the central limit theorem holds for any such sequence, i.e., 
: Sx 4 N(0,1) Ge 
== p> 7 à 
vn i=l 


Conversely, if EX? < œo, n > 1, then (**) implies (*). 


Problem 3.4.4. (a) (The local limit theorem for random variables on a lattice.) Let 


E1, &2,... be independent and identically distributed random variables with mean 
value u = E&, and with dispersion o? = Dé. Set S, = & +...+&, > 1, and 
suppose that the variables £1, &,... take values on a lattice of step-size h > 0, i.e., 


take the values a + hk, k = 0,+1,+2,..., for some h > 0. 
Prove that as n — oo one has 


sup vn P{S, = an + hk} — 
k 1 


202n 


| Cite) 
exp 4 — ——_—_—_—""_ 5 | >o. 
TO 


(Comp. with the local limit theorem in [ P §2.6].) 
Hint. The proof can be carried out with the following line of reasoning, which 
involves characteristic functions. By Problem 2.12.9 one can write 


where (u) stands for the characteristic function of the variable £. It is clear that 


en /2 = 1 f eie ldu 
Vv 2m J—oo 


3.4 The Central Limit Theorem for Sums of Independent Random Variables... 197 


and, therefore, 


vno 
27 
n 


1 
P{S, = an + hk} — —e*/?| < 
20 


< lv | = e=] dt + e"l? dt. 
firt (sa) ee 


The expression in the right side does not depend on k and it only remains to show 
that as n — oo this expression converges to 0. 


(b) (The local limit theorem for random variables with density.) Let &1, &,... be 
independent and identically distributed random variables with mean value u = E£; 
and dispersion 0? = D&,. Suppose that the characteristic function ¢ = g(t) of the 
variable € is integrable and, consequently, £; admits a probability density given by 


1 OO 
f= f Oa 


(see [ P §2.12, Theorem 3]). 
Let fa = fa(x) denote the probability density function of the variable S, = 
E&i +... + én, n > 1. Prove that as n > œ one has 


sup | Jn fa (x) — 


Hint. Follow the argument used in the case of lattice-valued random variables. 


Problem 3.4.5. Let X1, X2,... be independent and identically distributed random 
variables with EX, = 0 and EX? = |, and let dı, d2,... be any sequence of non- 
negative constants, such that d, = o(D,), where D? = kaij de. Prove that the 
“weighted sequence” dı X1, d2X2,... satisfies the central limit theorem: 


> 5 dr X~ (0,1). 


Dr Z 1 


Problem 3.4.6. Let &,&,... be independent and identically distributed random 

variables with E£; = 0 and EE? = | and suppose that (t,,),>1 is some sequence of 
P 

random variables with values in the set {1, 2, . . . }, chosen so that t„/n —> c, where 

c > 0 is some fixed constant. Setting S, = E + ... + &,, prove that 


Law (t, Sn) > ®, 


2 d 
hex, TF a — &, where E€ ~ .V(0,1). (Note that the sequences (t,),>1 and 
(En)n>1 are not assumed to be independent.) 
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Problem 3.4.7. Let &,&,... be independent and identically distributed random 
variables with E£; = 0 and E£? = 1. Prove that 


Law(n7"/? max Sm) — Law(|é|), where & ~ .V (0, 1); 


l<m< 


in other words, for any x > 0 one has 


2 i 2 1 
=1/2 —y*/2 = 
Pfn D, Sm < x} => 4 = J e dy ( a erf(x)). 


Hint. First prove the statement for symmetric Bernoulli random variables 
&1,&,... with P{é, = +1} = 1/2, and then use — or, better yet, prove, which 
is non-trivial — the fact that the limiting distribution would be the same for any 
sequence &, &,... with the specified properties. (The independence of the limiting 
distribution from the particular choice of the independent and identically distributed 
random variables &,&,..., with Eg, = 0, E? = 1, is known as “invariance 
principle”; see, for example, [10] and [17].) 


Problem 3.4.8. In the context of the previous problem (and hint) prove that 


Pn? max |Sm| < x} > A(x), x>0, 


l<m<r 


where 


_4 2 (=1)} (2k + 1} r? 
HOS S a! Re 


Problem 3.4.9. Let X1, X2,... be independent random variables with 


1 1 
P{X, = +n = —, P{X%, =0}=1-——, where2a> fB-1. 
2nb nb 


Prove that in this case the Lindeberg condition holds if and only if 0 < £ < 1. 


Problem 3.4.10. Let X1, X2,... be independent random variables chosen so that 
|X| < Cn (P-a. e.) and let C, = o(D,), where 


D? = X E(X; — EX;)? > 00. 
k=1 
Prove that 


Sn — ES, 
a“ 4 N(0,1), where S, = Xi +--+ X,. 
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Problem 3.4.11. Let X1, X2,... be any sequence of independent random variables 
with EX, = 0 and EX? = ø. In addition, suppose that this sequence satisfies the 
central limit theorem and has the property 


k 
= 2k)! 
e( 0.1" x] > a for some k > 1. 


i=l 
Prove that Lindeberg’s condition of order k holds, i.e., 


>| |x|‘ dF; (x) =0(D*), e>0. 
j=l {|x|>e} 
Note that the usual “Lindeberg condition” is of order k = 2—see [ P §3.4, (1)]. 


Problem 3.4.12. Let X = X(A) and Y = Y(2) be two independent random 
variables having Poisson distribution with parameters, respectively, 4 > 0 and 
u > 0. Prove that 


(XQ) —-—A)-(%() — Bw) a 
> A> => oOo. 
XO TY V(0,1) as oO, U ee) 


Problem 3.4.13. Given any n > 1, suppose that the random vector 


(n) (n) 
Oa) 


is uniformly distributed on the unit sphere in R”+!, Prove the following statement, 
due to H. Poincaré: for every x € R, 


1 x 2 
lim P{ Ja xi? < x\ = -=f e™/? du. 
noo Pl J 20 —oo 


Problem 3.4.14. Let &|,&,... be a sequence of independent and -/ (0, 1)- 
distributed random variables. Setting S, = & + ...+&,,m > 1, find the limiting 
probability distribution (as n — oo) of the random variables 


1 n 
Sk C=). n21. 
k=1 


Problem 3.4.15. Let £1, &,... be a symmetric Bernoulli scheme (i.e., a sequence 
of independent and identically distributed random variables with P{&; = 1} = 
P{& = —1} = 1/2) and let Sọ = Oand S = & +...+&,k = 1. 
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Define the continuous processes X°”) = (X, seg eee so that 
S 
xe” = 2nt 
2n 


where, given any u > 0, S, is defined by way of linear interpolation from the nearest 
integer values. 

Hint. Prove the following—difficult but important—statements: 

(a) The distributions P2” = Law(X/°"), 0 < t < 1) converge (in terms of finite 
dimensional distributions and in terms of the weak convergence of distributions on 
the metric space C, endowed with the uniform distance) to the distribution law P = 
Law(B;,0 < t < 1) of the Brownian motion B = (B;)o<;<1. (The statement about 
the weak convergence in C is a special case of the Donsker—Prokhorov invariance 
principle—see [ P §7.8, 1].) 

(b) The conditional distributions Q?” = Law(X/""”,0 < t < 1| X2") converge 
(in the same sense as in (a)) to the distribution Q = Law(B?,0 < t < 1) of the 
Brownian bridge B° = (B?)o<r<1. 

Hint. Use the same line of reasoning as in the derivation of Kolmogorov’s 
limiting distribution in [ P §3.13]. For more details see the books [10] and [17]. 


Problem 3.4.16. Conclude from the results in the previous problem (and compare 
these results with the statements in Problems 3.4.7 and 3.4.8) the following limiting 
relations: for any x > 0 one has: 


(ai) P| max S; < x} > PÍ max B, < x} ( = P{|Bi| < x}); 
<t< 


1 
a/2n O<k<2n 


(ax) P| 


max |S;| < x} > P| max |B;| < xt; 
0<t<1 


1 
V2n Osk<2n 


and 


Qn O<k<2n O<t< 


1 
(bı) P( max Sk <x | Sn = 0) > P| max BP < zh: 


1 
(b2) P( max |S;| < x| Sn = 0) => P| max |BP| < x}. 
2n 0Sk<2n 0<t<1 


Problem 3.4.17. As a continuation of Problems 3.4.15 and 3.4.16, verify the 
following relations: 


1 
P: —_| — min Sk| <x} > P{ max B, — min B, < x}; 
(@) oa Oken Sk ozke2n Sk <x fe tee T 
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and 


[ max S$, — min S] <x |S» =0) 


O<k<2n O<k<2n 


1 
b) P 
©) (= 
> P} max B? — min B? Sah 
O<t<1 


O<t<1 


Problem 3.4.18. Assuming that N € [0, 00) and À € (0, 00), prove that 


a yf 1, ifN>A, 

n 

e man = a 1/2, if N = À, 
Kenn F 0, ifN <A. 


Show also that 


im ( Gary fe, ifN <A, 
n=>o0 k! eee, ifN >A. 


Hint. Let (&,),>1 be any sequence of independent Poisson random variables with 
expected value A, i.e., P{E, = k} = e*A*/k!, k > 0. Convince yourself that 


es ETES So (nyt 


n E k! 
k<nN 
and then use the central limit theorem. 
Problem 3.4.19. Prove that 
1 n+l 
— x"e *dx>-= as n—>oœ 
n! Jo 


and, more generally, that 


yV¥n+1+(n+1) 
moo! x"e* dx =@(y), y>0. 


lim 
n D(n+ 
Show also that 


i 1? Do 1 
iat pe re a a as n > oo. 
1! 2! n! 2 


Hint. Use the result from Problem 2.8.80 and the statement in the previous 
problem (in the case N = A). 
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Problem 3.4.20. Let £1, &2,... be any sequence of independent and identically 
distributed random variables such that the expected value u = E£, is well defined 
and o? = Dé; < oo. Setting Sọ = 0 and S, = & +... + én, prove that the 
sequence of partial maxima M, = max(So, S1, ..., Sn), n = 0, satisfies the central 
limit theorem: if 0 < u < oo, then 


Mn = 
lim PY 2E < x} = (x), x ER, 
n os/n 


where ®(x) is the distribution function of the standard normal distribution. 


Problem 3.4.21. Let &,&,,... be any sequence of independent and identically 
distributed random variables with E£; = 0 and EE? = 0°. Set S, =& +... + En, 
n > 1, and let {N;,t > 0} be any family of random variables with values in the set 


P 
{1,2,...}, such that N; /t > à as t > œ, 0 < À < œ. 
Prove the following version of the central limit theorem (due to F. J. Anscombe): 
as t —> œ one has 


IER —> (x) and pisa] —> (x). 


Hint. For the sake of simplicity set o = 1 and let nọ = |At|. The expression 
Sn, / ~ N; can now be written in the form 


i ( Sno Ji SN, -m Sn) no 
JN \Jm Jm ) VN 


P 
Since P {Sno / v0 < x} —> (x) and no/ N; > 1 ast —> œ, it only remains 


P 
to show that (Sy, — Sno)/ vno — 0. For that purpose it is enough to write the 
probability P{| Sw, — S;,,| > €,/no} as the sum 


P{|Sw, — Sno| > e710, Nr € [n1, n2]; + P{|Sn, — Snol > evno, N: Z [n1, n2]}, 


with ny = |no(1 — £°)] + 1, n2 = |no(1 + £)]. The convergence of the 
above probabilities to 0 can be established by using Kolmogorov’s inequality (see 
[P §4.2]). 


Problem 3.4.22. (On the convergence of moments in the central limit theorem.) 
Let £1, &,... be any sequence of independent and identically distributed random 
variables with EE; = 0 and E£? = øo? < oo. According to [P §3.3, Theorem 3] 
and part (b) of [ P §3.4, Theorem 1], one has 


where N is a standard normal (4 (0, 1)) random variable. 
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Prove that if E]&|" < oo, for some r > 2, then for any 0 < p < r the following 
convergence of moments takes place 


Sn 
oyn 


p 
E | > E|N|?. 


Hint. Prove that the family of random variables 1 zA n= 1} is uniformly 
integrable and then use the statement in part (b) of Problem 2.10.54. 


Problem 3.4.23. Let (&,),>1 be a sequence of independent and identically dis- 
tributed standard normal random variables (i.e., &, ~ (0, 1)) and suppose that 
the random variable £ ~ ~ (0, 1) is independent from the sequence (én )n>1. 

Prove that the limit 


lim E & +...+&, =f 
n n 
exists and equals 2/./z. 
Hint. Convince yourself that the family of random variables {S,/./n — £: 
n > 1}, where S, = & +... + En, is uniformly integrable. 


Problem 3.4.24. Let P and Q be two probability measures on (2, F), chosen so 
that Q is absolutely continuous with respect to P (Q « P), and suppose that, 
relative to P, X1, X2,... is a sequence of independent and identically distributed 
random variables with m = Ep X;, o? = Ep(X;—m)*, where Ep means expectation 
with respect to P. 

According to [P §3.3, Theorem 3], the central limit theorem holds: as n — oo 
one has 


Pt La -m <a} > (0, xER, 


where (x) = Tz Es el? dy, 

Now consider the measure Q. Even if Q « P, the sequence Xj, X2,... will 
not, in general, represent a sequence of independent random variables relative to Q. 
Prove that, nevertheless, the central limit theorem still holds in the following form, 
which is due to A. Rényi: as n — oo one has 


of = dX —m)< x} —> (x), XER. 


Hint. One has to prove that if f = f(x) is some bounded and continuous 
function, then 


Eo f (Sn) > Ep f(N(0, 1)), 
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where S, = — yo (Xi — m) and N(0, 1) is a standard normal (^ (0, 1)) 


random variable. For that purpose, consider the Radon—Nikodym density D = a 


and the random variables Dy = Ep(D | Fk), where Fy = o(X1,..., Xk), and 
write Eo f (Sn) in the following form 


Ea f ($n) = Ep[(D — Dg) f(S)] + Ep[Di f Cn). 
Then prove that lim, sup, |Ep[(D — Dx) f(Sn)]| = 0 and 
Ep[Dz f En) — Ep f(N) asn > œ, for every k > 1. 


Problem 3.4.25. Let &,&,... be a sequence of independent and identically dis- 
tributed random variables, such that 


P{& > x} = P{& <—x}, xeR, and P{&]>x}= x, x21. 


Prove that as n —> oo 


S 
P:—” <x + @(x), xER, 
E ) 


where S, = & +... + &. 


Remark. This problem shows that, after a suitable normalization, the distribution 
of the sums S,, may converge to the standard normal distribution even if E£? = oo. 

Hint. Consider the random variables &,, = & I (|&;| < s/n InInn) and convince 
yourself that: 


O XO Pén Æ Ek} > Oas n > 00; 
k=1 
(ii) E&?, ~ Inn asn —> oo; 
(iii) by Lindeberg’s Theorem (Theorem 1) one has 


1 n 
S En S N (0,1); 
Inn jo 


n 


Gv) P{s, Æ a > 0asn > co. 


k=1 
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3.5 The Central Limit Theorem for Sums of Independent 
Random Variables II. Non-classical Conditions 


Problem 3.5.1. Prove formula [ P §3.5, (5)]. 
Hint. By using the relations 


TELLO <oo and f e aouo < œ, 
R R 


conclude that the integrals in the left and the right sides of [ P §3.5, (5)] are finite 
and then use the relation 


oO p22 
/ (e —itx + —) d (Fuk = Prk) = 


co 


l DS pe 
= lim o —itx + >) (Fre (x) = Frx(x))| = 


a—>oo —a 
oo 5 
-= uf (e™ — 1 — itx) [Fn Œ) — Bnk (x)] dx. 
=00 


Problem 3.5.2. Verify the relations [ P §3.5, (10) and (12)]. 


Problem 3.5.3. Let N = (N,;);>0 be the renewal process introduced in [ P §2.9, 4] 
(e., Ni = pene I(T, < t), Ta = 01 +++: + On, where 01, 02,... is a sequence of 
independent and identically distributed positive random variables). Assuming that 
u = Eo, < œ and 0 < Do; < œ, prove that the Central Limit Theorem holds: 


N; = tu! 


Vtu Do, 


where N(0, 1) is a standard normal random variable. 


£ N(0,1), 


3.6 Infinitely Divisible and Stable Distributions 


Problem 3.6.1. Prove that if &, 4 E and &, 4 n, then £ £ n. 


Problem 3.6.2. Prove that if yg; and 2 are two infinitely divisible characteristic 
functions, then g; - @2 is also an infinitely divisible characteristic function. 


Problem 3.6.3. Let g, = @,(t), n > 1, be infinitely divisible characteristic 
functions and suppose that there is a characteristic function g = g(t) for which 
one can claim that ¢,(t) —> g(t), for each t € R. Prove that y(t) must be infinitely 
divisible. 
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Hint. Use the fact that if @, is infinitely divisible, then one can find some 
independent and identically distributed random variables &,..., En, such that S, = 


d 
Ei +- + &, has characteristic function gy, and S, — T, where T is infinitely 
divisible. 


Problem 3.6.4. Prove that the characteristic function of an infinitely divisible 
distribution cannot be equal to 0 (see also Problem 3.6.12). 

Hint. The required statement follows directly from the Kolmogorov—Lévy— 
Khinchin formula, but one can give an independent proof by using the following 
argument: if g(t) is the characteristic function of some infinitely divisible distri- 
bution, then for every n > 1 one can find a characteristic function @,(t), such 
that g(t) = (%,(t))", and, setting W(t) = |pn(t)|?, prove that the function 
w(t) = lim, w,(t) must be identically 1. 


Problem 3.6.5. Prove that the gamma-distribution is infinitely divisible but is not 
stable. 

Hint. The proof can be constructed by analogy to the following line of rea- 
soning. A random variables €, which is distributed according to the Poisson law 
P{é = kt =e-*/k!, must be infinitely divisible (see Problem 2.8.3). However, 
such a random variable does not have a stable distribution. Indeed, assuming that 


& +é £ a +b, where a > 0,5 € R, and & and & are two independent copies of 


E, argue that it must be that a = 1 and b = 0. This means that & + & £ E, which 
is not possible. 


Problem 3.6.6. Prove that for a stable random variable € one must have E|&|" < oo, 
for all r € (0,@) andall0 <a <2. 

Hint. By using the Lévy—Khinchin representation of the characteristic function 
y(t) of the stable random variable €, conclude that there exists some 6 > 0, such 
that for any ¢ € (0,5) and any œ < 2 one has Re g(t) > 1 —c|t|*, where c > 0. By 
[P §3.3, Lemma 3], for a € (0,6) one has 


1 cK 
Pilél > =} < a", 
and therefore 
P{JE = n} < e Elg|’ < 1+ JL PIEI =n} < o0, 
a+l 


n=1 


ifr e (0,a) and0 < Q <2. 


Problem 3.6.7. Prove that if € is a stable random variable with parameter 0 < œ < 
1, then its characteristic function g(t) cannot be differentiable at t = 0. 
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Problem 3.6.8. Give a direct proof of the fact that the function e~¢!!"" is a 
characteristic function d > 0 and 0 < a@ < 2, but not if d > O anda > 2. 


Problem 3.6.9. Let (b,,)n>1 be any sequence of real numbers, chosen so that for all 
|t| < 6 and 5 > 0 the limit lime!” exists. Prove that lim |b,| < oo. 


Hint. The statement can be proved by contradiction. Let lim, bn = +00. 
Switching, if necessary, to a subsequence, one can claim that b, —> oo as n —> oo. 
Then, setting h(t) = lim, et” fort € [—6, 5], one can write for any [a, B] € [—6, 8] 


5 ô 
I Ia p t)h(t) dt = tim f Ta] (t)ettPn dt =Q. 
8 n J- 


By using the suitable sets principle (see [P §2.2]), it is possible to conclude that 
SŠ; I4(t)h(t) dt = 0, for every Borel set A € &([—6, 5]). Consequently, one must 
have h(t) = 0 for any ¢ € [—6, 6]. At the same time, since |e’"’"| = 1, one must 
have |h(t)| = 1, for any t € [—6, ô]. This contradiction shows that lim, |b, | < œœ. 


Problem 3.6.10. Prove that the binomial, the uniform and the triangular distribu- 
tions are not infinitely divisible. (Recall that the triangular distribution on (—1, 1) 
has density f(x) = (1 — |x) Icin) 

Prove the following more general statement: a non-degenerate distribution with 
finite support cannot be infinitely divisible. 


Problem 3.6.11. Suppose that the distribution function F and its characteristic 
function g admit the representations F = F) x.. -x F™ (n times) and yg = [py ]", 
for some distribution functions F) and their respective characteristic functions 
g™,n > 1. Prove that it is possible to find a (sufficiently “rich”) probability space 
(2, #,P) and random variables T and (n )k<n; n = 1, defined on that space (T has 
distribution F, while n\”’,..., n” are independent and identically distributed with 
law F0) and such that T 4 ie fies”) nal. 

Problem 3.6.12. Give examples of random variables that are not infinitely divisi- 
ble, and yet their characteristic functions never vanish (see Problem 3.6.4). 


Problem 3.6.13. Prove that: 

(a) The function œ = (t) is a characteristic function of an infinitely divisible 
distribution if and only if for every n > 1 one can claim that the n® root 
ot) =e r Ing(t) (here In stands for the principle value of the logarithmic function) 
is a characteristic function. 

(b) The product of finitely many characteristic functions associated with in- 
finitely divisible distributions is an infinitely divisible characteristic function. 

(c) If the characteristic functions @,(t), n > 1, associated with some infinitely 
divisible distributions, converge in point-wise sense to the function g(t), which 
happens to be characteristic function, then g(t) must be the characteristic of some 
infinitely divisible distribution. 
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Problem 3.6.14. By using the results established in the previous problem and the 
fact that 


y(t) = exp{A(e"™ — 1) + itf}, A>0, uER, BER, 


is known to be a characteristic function of an infinitely divisible distribution law (of 
Poisson type), prove that the following functions (studied by B. de Finetti) have the 
same property: 


k 
g(t) = exp | XO Ase -1) + wp, 
j=l 
and 


lo) 


y(t) = exp fit +f 


—oo 


Ge —1) acw) , 


where G = G(u) is some bounded and increasing function. 


Problem 3.6.15. Let y = g(t) be the characteristic function of some distribution 
that has a finite second moment. Prove that y(t) can be a characteristic function of an 
infinitely divisible distribution law if and only if it admits the so-called Kolmogorov 
representation: 
p(t) = exp y (t) 
with 
o 1 
w(t) = itb + f (e —1— itu)— dG(u), 
a u 


where b € R and G = G(u) is a non-decreasing and left-continuous function with 
G(—oo) = 0 and G(co) < œo (comp. with de Finetti’s function from the previous 
problem). 


Problem 3.6.16. Prove that if g(t) is the characteristic function of some infinitely 
divisible distribution, then for every A > 0 the function y*(t) is characteristic. 


Problem 3.6.17. (On the Kolmogorv—Lévy-Khinchin representation.) Let h = 
h(x) be a cutoff function, defined for x € R (i.e, a bounded and continuous function 
chosen so that h(x) = x in some neighborhood of x = 0). 

Prove that: 

(a) The Kolmogorov-Lévy-Khinchin representation [P §3.6, (2)] can be re- 
written in the form 


p(t) = exp Y(t) 
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with 
2 oo 
Wilt) = itb — < + LC — 1 -ith(x)) F(dx), 


where b = b(h) € R,c > Oand F(dx) is a measure on (R, A(R)) with F({0}) = 0 
and f (x? A 1) F(dx) < œ. 

(b) For two different cutoff functions h and h’ the coefficients b(h) and b(h’) are 
linked through the relation 


b(h') = b(h) + [ow — h(x)) F(dx). 


(c) If g(t) corresponds to a distribution that has a finite second moment 
(comp. with Problem 3.6.15), then JS x? F(dx) < œ. 


Problem 3.6.18. Prove that the probability distribution with density 


1 1/Ox 
Z —1/(2x) 
f(x) = = e , x>0, 


is stable for œ = b, B = 0 and 0 = —1 (see formula [ P §3.6, (10)]). 


Problem 3.6.19. One says that random variable &,, has generalized Poisson distri- 
bution with parameter À ({xm}) > 0, if 


eA (tx) 9 & ({Xm X) 


P{Em = kxm} = kl , 


where Xm € R \ {0}. 

Let &,...,&, be n mutually independent random variables that share the above 
distribution. Let à = A(dx) denote the measure on R \ {0} which is supported on the 
set {Xm :m = 1,...,n}, consisting of n different points and let A({x,,}) denote the 
probability mass of the point xm. The probability distribution of the random variable 
Ta = & +... + &n is known as the compound Poisson distribution. 

Prove that the characteristic function g7,(t) of such a random variable is 
given by: 


r(t) = exp | ka” —1)A(dx)}. 


(It is clear from the above formula that the compound Poisson distribution is 
infinitely divisible. In conjunction with the Kolmogorov—Lévy—Khinchin formula 
[P §3.6, (2)] this illustrates the “generating role” that this distribution plays in the 
class of all infinitely divisible distributions. Formally this property can be stated as 
follows: every infinitely divisible distribution is a (weak) limit of some sequence of 
compound Poisson distributions.) 


Problem 3.6.20. On the probability space (2,.F,P) consider the observation 
scheme consisting of the events AM = (Ang, 1 < k < n),n > 1, chosen so 


210 3 Topology and Convergence in Spaces of Probability Measures... 


that for every n the events Anı, ..., Ann are independent. Let 


lim max P(A,,;) = 0 


n 1<k<r 
and 
n 
lim) P(An) =A, à>0. 
k=1 
Prove the “rare events law:” the sequence of random variables € = hed T(Ank) 


converges in distribution to a random variable £ that has Poisson distribution with 
parameter A > 0. 


Problem 3.6.21. Let X and Y be any two independent random variables, dis- 
tributed with Poisson law of parameter A > 0. Find the characteristic function g(t) 
of the random variable X — Y which is often referred to as a double-sided Poisson 
random variable. Prove that the probability distribution of the random variable X —Y 
is the compound Poisson distribution (see Problems 3.6.19 and 2.8.3). 


Problem 3.6.22. Let &@) = (& 4,1 < k < n), n > 1, be an observation series of 
random variables, such that for any n the variables &,;,..., Enn are independent. Let 
Ønk = Pnk (t) denote the characteristic functions of the random variables &,,;. Prove 
that the following conditions are equivalent: 

(a) lim, maXxı<k<n P{|E,~| > £} = 0 (limiting, or asymptotic, negligibility of the 
series €”,n > 1); 

(b)lim, maxj<x<n |1 — @nx(t)| = 0 for every t € R. 


Problem 3.6.23. The random variable & is said to be distributed according to the 
(continuous) Pareto law (with parameters p > 0, b > 0), if its density is given by 


pb? 
xerl 


So.b(X) = I(x >b). 


Prove that this distribution is infinitely divisible. 
Remark. The discrete Pareto Law is defined in Problem 2.8.85. 
Problem 3.6.24. The random variable € with values in (0,00) is said to have a 


logistic distribution with parameters (u, p), where u € R and p > 0, if 


1 


PLE<x}= T4 e C 0e’ 


x>0. 


Prove that this distribution is infinitely divisible. 
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Problem 3.7.1. Prove that, in the case of the space E = R, the Lévy—Prokhorov 
distance L(P, P) between the distribution laws P and P is not smaller than the 
Lévy distance L(F, F) between the distribution functions F and F, associated with 
the laws P and P (see Problem 3.1.4). By constructing appropriate examples, prove 
that the inequality between these two metrics can be strict. 

Hint. To prove that L(F, F) < L(P, P), it is enough to show that 


L(F, F) = inffe > 0: P(D) < P(D®) + £ and P(D) < P(D*) +e 
for all sets D of the form (—oọ, x], x € R} 


and 


L(P, P) = inffe > 0: P(D) < P(D®) + £ and P(D) < P(D®) +e 
for all closed sets D C R}. 


In order to obtain the strict inequality, take P = ôo and P= (6—1 + ô1), where ôa 
is the measure concentrated at the point a: 


1, ifa €A, 


ôa(A) = 
4) 0, ifagA. 


In this case L(F, F) = } and L(P, P) = ]—prove these two identities. 


Problem 3.7.2. Prove that formula [ P §3.7, (19)] defines a metric in the space BL. 

Hint. To prove that || P — Pš a = 0> P= P (the remaining properties of 
the metric are easy to verify), given any closed set A and any £ > 0, consider the 
function f(x), defined by formula [ P §3.7, (14)]. Since as £ | 0 one has 


J fie Pay > rw and | feo Pan + Pa, 


then P(A) = P(A) for every closed set A. Finally, consider the class ⁄ = {A € 
BE) : P(A) = P(A)} and by using the suitable sets principle and the z-A- 
systems from [P §2.2], conclude that W = Z(E). 


Problem 3.7.3. Prove the inequalities [ P §3.7, (20), (21) and (22)]. 
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Problem 3.7.4. Let F = F(x) and G = G(x), x € R, be any two distribution 
functions and suppose that P, and Q, are the intersecting points of their graphs 
with the graph of the line x + y = c. Prove that the Lévy distance L(F, G) (see 
Problem 3.1.4) can be expressed as 


P 
L(F, G) = sup N 


where P, Q. is the length of the segment connecting the points P, and Qe. 


Problem 3.7.5. Prove that the space of all distribution functions is complete for the 
Lévy metric. 


Problem 3.7.6. Consider the Kolmogorov distance between the distribution func- 
tions F and F, which is given by 


K(F, F) = sup |F(x) — Fœ], 
and let L(F, F) be the Lévy distance from Problem 3.7.4. Prove that 
L(F, F) < K(F,F) 
and that, if the distribution function F is absolutely continuous, then 


K(F,F) < (1 + sup [Fon |)L(F, F). 


Problem 3.7.7. Let X and X be two random variables defined on one and the same 
probability space and let F and F be their respective distribution functions. Prove 
that the Lévy distance L(F, F) is subject to the following inequalities: 

L(F,F) <d + Pix —X| >a}, Vd > 0, 


and 


L(F, F) < (c+ De® (EIX - X9, Yez. 


Problem 3.7.8. By using the results in Problems 3.7.6 and 3.7.7, prove that if X 
and X are two random variables defined on one and the same probability space, if 
F and F denote their respective distribution functions, and if 6 = (x) stands 
for the distribution function of the standard normal law -¥V (0, 1), then the following 
inequality holds for any o > 0: 


Fix)-0(=) 
Oo 


(orale Foen] 


sup 
x 
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3.8 The Connection Between Almost Sure Convergence 
and Weak Convergence of Probability Measures 
(the “Common Probability Space” Method) 


Problem 3.8.1. Prove that if E is a separable metric space with metric p(-,-), and 
X(q) and Y(q@) are any two random elements in E, defined on the probability space 
(2, F,P), then one can claim that p(X(w), Y(w)) is a real random variable on 
(92, F,P). 

Hint. Let {z1, z2,... } be any countable and everywhere dense subset of E. Prove 
that for every a > 0 


{o : p(X(@), Y(@)) < a} = 


=NU (fo: xoz) < L Nn fo: Ozn) <a- 1}, 


n=l m=1 


and, by using [P §2.4, Lemma 1], conclude that p(X(w), Y(w)) must be a F- 
measurable function on 2. 


Problem 3.8.2. Prove, that the function dp(X, Y), as defined in [ P §3.8, (2)], is a 
metric in the space of random elements in £. 

Hint. The statement in the previous problem shows that the set {o(X,Y) < e} 
is measurable, and therefore dp(X, Y ) is a well-defined random variable. The proof 
of the fact that dp(X, Y ) actually represents a metric is straight-forward. 


Problem 3.8.3. Prove the implication [ P §3.8, (5)]. 


Problem 3.8.4. Setting A, = {x € E : h(x) is not p-continuous at the point x}, 
prove that A, E€ &. 

Hint. Let {a,,a,...} be any countable and everywhere dense subset of E. In 
order to prove that A, € &, it is enough to establish the following representation 


oO CO oO 
An = U N U Anm.k > 


where the sets 


Bijm(ak), if one can find y,z € Bi/m(ak) 
Anm.k = so that |h(y) T h(z)| > i 


n 
Ø otherwise 


all belong to &. 
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Problem 3.8.5. Suppose that (£, 7) and (Ë , ĵ) are two identically distributed pairs 
of random variables, i.e., (Ẹ, n) £ (È, ñ), with EJE] < oo. Prove that E( | 7) 4 
E(é | 7). 

Problem 3.8.6. Let € and 7 be any two random elements (defined on a sufficiently 
rich probability space) which take values in the Borel space (E, £) (see [P §2.7, 
Definition 9]). Prove that one can find a measurable function f = f(x, y), defined 
on E x [0,1] and taking values in Æ, and also a random variable a, which is 
uniformly distributed in the interval [0, 1], so that the following representation holds 
with probability 1 

f= f(n.a). 


Problem 3.8.7. Let &,&,... be any sequence of independent and identically 
distributed random variables with Eé; = 0, E£? < œ and let X, = pa) k» 
n > 1. Prove the following result (known as Skorokhod’s embedding): there i is a 
probability space (2, F, P), on which one can construct a Brownian motion B= 
(B, ):>o and a sequence of stopping times T = (Tk)k>0o with 0 = To < Ti <..., SO 
that 


(Xn)z1 = (Bz )nz1 


a dy. 
and E(t, — t—1) = E&?, n > 1. (As usual, the symbol “=” is understood to mean 
“identity in distribution”’.) 


Problem 3.8.8. Let F = F(x) bea distribution function on R and define its inverse 
F-'(u),0 < u < 1, by 


F"(u) = inf{x: F(x) >u}, u<1, 
u= l. 
Prove that: 

(a) {x; F(x) > u} C {x; F7! (u) < x} C {x; F(x) > u}; 

(b) F(F~!(u)) > u, F7'(F(x)) > x; 

(c) if the function F = F(x) is continuous, then F~! (u) = inf{x; F(x) 
uy}, F! (u) = max{x; F(x) = u}, F(F7'(u)) > u, and {x; F(x) > u} 
{x; F7! (u) < x}; 

(d) inf{x; F(x) > u} = sup{x; F(x) < u}. 


Il 1V 


Remark. In Statistics the function Q(u) = F~! (u) is known as the quantile 
function of the distribution F. 


Problem 3.8.9. Let F = F(x) be any distribution function and let F~! = F7! (u) 
be its inverse. 

Prove that if U is any random variable which is uniformly distributed on [0, 1], 
then the distribution of the random variable F7! (U) is precisely F, i.e., 


P{F-'(U) < x} = F(x). 
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In addition, prove that if the distribution function F = F(x), associated with the 
random variable X , happens to be continuous, then the random variable F(X) must 
be uniformly distributed in [0, 1]. 


Remark. If C(u) = P{U < _ u} is the distribution function of the random 
variables U, which is uniformly distributed in [0, 1], then one must have C(F(x)) = 
F(x)—see Problem 3.8.12. 


Problem 3.8.10. Let F(x, y) be the distribution function of the pair of random 
variables (£, n) and let Fi(x) = P{E < x} and Fo(y) = P{n < y} be the 
distribution functions of £ and n. Prove the Fréchet-Hoeffding inequality: 


max(F\(x) + Fo(y) — 1,0) < F(x, y) < min(Fı (x), Fo(y)), forall x,y E€ R. 


Problem 3.8.11. Let (U,V) be some random vector in [0, 1]* with distribution 
function 
C(u,v) = P{U <u, V < v}, 


and suppose that U and V are both uniformly distributed in [0, 1]. Let F; (x) and 
F,(y), x, y € R, be any two continuous distribution functions. 
Prove that the function 


F(x, y) = Cix), Fey), =x VER, (*) 


is a bi-variate distribution function with marginal distributions F\(x) and F2 (y). 


Remark. For a given bi-variate distribution function F(x, y), with marginal 
distributions F(x) and Fy(y), x,y € R, it is interesting to know how to construct 
the function C (u, v) so that property (+) holds. Functions that share this property and 
can be written as bi-variate distributions of the form P{U < u,V < v}, for some 
random variables U and V that take values in [0, 1], were introduced by A. Sklar 
in 1959 under the name copula. His work [122] contains existence and uniqueness 
results for such functions. The next problem provides an example. 


Problem 3.8.12. Consider the bi-variate distribution function 
F(x,y) = max(x + y — 1,0), 


where 0 < x,y < 1. 

Prove that the associated marginal distribution functions, F(x) and F3 (y), give 
the uniform distribution on [0, 1]. 

Show also that for the copula 


C(u,v) = max(u+ v — 1,0), 0<u,v<l, 


one must have 
F(x, y) = C(Fi(x), P(y)). 


Remark. Compare the statement in this problem with the one in Problem 3.8.9. 
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Problem 3.8.13. Suppose that the random variables £ and &, &,... are chosen so 
that £, > 0 and Law(&,) — Law(&). Prove that 


Eé < lim E£. 


Hint. Use [ P §3.8, Theorem 1] and [ P §2.6, Theorem 2] (Fatou’s Lemma). 


3.9 The Variation Distance Between Probability Measures. 
The Kakutani-Hellinger Distance and the Hellinger 
Integral. Applications to Absolute Continuity 
and Singularity of Probability Measures 


Problem 3.9.1. Adopting the notation introduced in [P §3.9, Lemma 2], set 
PAP = Eo Ad, 
where z ^A Z = min(z, Z). Prove that 
IP -P| =20— PAP) 
and conclude that ér (P, P) = PAP (for the definition of ér (P, P) see 


[P §3.9, 1]). 
Hint. Use the relation a ^ b = F(a +b — |a — b|). 


Problem 3.9.2. Let P, Pa, n > 1, be probability measures on (R, A(R)) with 
densities (relative to the Lebesgue measure) p(x), p(x), n > 1, and suppose that 
Pn(x) —> p(x) for Lebesgue-almost every x. Prove that 


CO 
ir- Pal =f |p(x) — pr(x)|dx ~0 asnow. 
—00 


Hint. Consider the inequality 
ie.) 
f m-pa) mda pdx 
—=00 {lx|<a} {| 


x|>a} 
+ f Pn(x) dx, 
{lx|>a} 
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where a > 0 is chosen so that Seei<a} p(x) dx > 1 — e for every € > 0. By Fatou’s 
lemma 7 


lim Pn(x) dx > 1—e€. 


n J{\x|<a} 


Problem 3.9.3. Let P and P be any two probability measures. The Kullback 
information K(P, P), which measures the “divergence” of P from P, is defined as 


Efm], if P <P, 


ee) otherwise. 


K(P,P) = 


Prove that 
K(P, P) > —2In(1 — p°(P, P)) > 2p°(P, P), 
where p(P, P) is the Kakutani—Hellinger distance between the measures P and P. 


Hint. The second inequality follows from the relation —In(1—x) > x,O0<x < 
1. To prove the first inequality, show that 


~2In(1 — p?(P, P)) = —21n Ep re 
Z 


and then conclude from Jensen’s inequality that 


—21n Ep < K(P, P). 
4 


Problem 3.9.4. Prove formulas [ P §3.9, (11) and (12)]. 
Problem 3.9.5. Prove the two inequalities in [ P §3.9, (24)]. 


Hint. With Q = I(P + P), TS ao: andz = A setting y = z — 1, one finds 


that Z= 2 —z = 1 — y and that [ P §3.9, (24)] can be written in the form 


2(1 + Eo f(y)) < 2Eo|y| < Veal — Eo f(y)). 


where f(y) = (1+ y)*( — y)!"*, y € [-1, 1]. By analyzing the functions f’(y) 
and f”(y) on the interval (—1, 1), one can prove that: 

(a) f = f(y) is concave on [—1, 1] and f(y) = 1—|yI; 

b) f(y) <14 f’Oy —Tay, y € E11], with, = a(1 — a) /4, 

Finally, the first inequality can be deduced from (a), while the second one can be 
deduced from (b). 
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Problem 3.9.6. Let P, P, Q be probability measures on (R, A(R)) and let P * Q 
and P « Q stand for the respective convolutions (see [ P §2.8, 4]). Show, that 


|P +Q -P xQ] < |P- P|. 


Hint. Use [ P §3.9, Lemma 1]. 


Problem 3.9.7. Prove the relations (30) from [ P §3.9, Example 2]. 
Hint. By using straight-forward calculation, prove that 


H (377) aa a 


and then use [ P §3.9, Theorems 2 and 3]. 


Problem 3.9.8. Let € and ņ be any two random elements on (2, F , P) with values 
in the measurable space (E, &). Prove that 


[Pie € A} — Pine Al <P #n), AES. 
Hint. Use the relation 
| € A)— I(n € A)| = [T(E € A) -Im E ANTE # n). 


Problem 3.9.9. The Hellinger integral of order œ for the measures P and P is 
defined by (see formula [ P §3.9, (20)]) 


H(a: P, P) = f (aP) dP), 
2 


A useful tool in the study of many statistical experiments is what is known as the 
Hellinger transformation H (œ; £), which is defined as follows: 


Consider the statistical experiment E = (Q, F; Po, P\,..., Px), which con- 
sists of the measurable space (2,.#) and the finite family of probability mea- 
sures Po, Pi, ..., Px defined on that space. 


In symbolic form the Hellinger transformation H(a; £) of the experiment & is 
defined by the formula: 


H(a;&) = f aro” ... (dP, (*) 


where œ = (do, ..., œp) belongs to the symplex 


kpi = a = or) ia >0,ġ a; = l 
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Similarly to the case k = 1, give meaning to the “integral” in (*) (by using 
the concept of “dominating measures”) and prove the corresponding analog of 
Lemma 3. 


Problem 3.9.10. Let (Xz, 4(2;,)) denote the simplex 


k 
y= |= (ier) >0,ġ x = 17, 


i=0 


equipped with the associated Borel o-algebra A(L;,). 
Let u = (dx) be any measure on (Xr, A(X;,)), such that w(L;,) < oo and 


f x; W(dx)=1, i=1,...,k. 
Ek 


(In the theory of statistical experiments measures, u, with the above properties are 
known as standard measures.) 

In Mathematical Analysis the Hellinger transformation, H(a; p), of the mea- 
sure u is defined by the formula 


H (œ; u) = / xi... ae Ce ae 
Ek 


forall æ € Xg. 

Prove the following statements: 

(a) If uı and u2 are two standard measures, such that H (œ; y1) = H (œ; u2), for 
alla € X, then one must have yı = p2. 

(b) The sequence of standard measure u, converges weakly to the standard 
measure ju if and only if H (œ; un) > H(a@; u) asn — œ, for all œ € Xx. 


Let E = (Q, F; Po, Pi, ..., Pk) be some statistical experiment, let Q be some 
probability measure that dominates the measures Po, P;,..., Pk, and let 
dP; 
p= v a4 besk 
dQ 
Setting 


H(A) = Q {w : (Plo), fie(@)) € A}, AE B(Ek+1), 


prove that u is a standard probability measure on the space (Xk, (Xp )) that shares 
the property 


A(a;&) = H(«; u). 
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Problem 3.9.11. Let & = (2, F : Po, Pi, ..., Px) be some statistical experiment, 


suppose that the measure Py dominates the measures P;,..., Pk, and let 
dP; 
Gia idee OK. 
' dPo 


In probability theory, the Mellin transformation of the experiment & is defined as a 
function of the argument f € A; given by 


MEPE f af ...zft Poldo) (= End... 2), 
2 


where 
k 
Ar = fs = (Bis. Br): O< i < 1,9 Bi < u. 
i=l 
In mathematical analysis, the Mellin transformation M(f; v) of the measure v is 
defined somewhat differently. Specifically, if 


RE =í = ines XENS Xi = 0i = Ls a aagdeky 
and v is a probability measure on (R4, BR‘ )), chosen so that 
i xi v(dx) < 1 
RÝ 
+ 


(a measures v on Re with this property is commonly referred to as standard 
measure), then one sets 


M(B; v) = I, x” wee v(dx), 


+ 


where p = (f,..., Bk) € Ag. 

Prove that: 

(a) if vı and v2 are any two standard measures for which M(6; vi) = M(B; v2) 
for all B € Ax, then vı = v9; 

(b) the sequence of standard measures (v,) converges weakly to the standard 
measure v if and only if 


M(B; va) > M(f;v), forall ĝ € Ax; 
(c) 


M(B; E) = M(B; v). 
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Problem 3.9.12. Prove that if a = (ao,...,a@,%) E Xk+1ı and @ > 0, then 
H(a;&) = M(6; £), 


where (6,,..., Bk) = (Q,...,Q). 
Convince yourself that if & = (ao,...,a@%) € X41, with ay > 0, and L; = Inzi, 


i=1,...,k, then 
k 


H(a;€) = Eyexp | Lart, 


i=l 


i.e., the Hellinger transformation H (œ; &) coincides with the Laplace transforma- 
tion of the vector (L1, ..., Lg) with respect to the measure Po. 


Problem 3.9.13. Suppose that P = ||p;||,.1 < i,j < N < œ, is a stochastic 
matrix (see [ P §1.12]). The variable 


N 
l 
D(P) = 5 sup ) [Pix — Pjk! 


u k=l 


is known as the Dobrushin ergodicity coefficient of the matrix P. 
Prove that: 


(a) D(P) = sup; ; || pi. — pil; 
(b) D(P) = 1 — infi j D2 (pik A Pjk); 
(c) if P and Q are any two stochastic matrices of the same dimension, then 


D(PQ) < D(P)D(Q); 
(d) if u = (m1, ..., Wn) and v = (v1,..., vy) are any two distributions, then 
luP” —vP"|| < lla- vl (D(P))". 


Problem 3.9.14. Suppose that € and ņ are two random variables with probability 
distributions P and Q. Show the coupling inequality: 


P= 7} S1-51P - Ol 


and compare this relation with the statement in Problem 3.9.8. In particular, if E and 
n are two random variables with densities p(x) and q(x), then 


1 
P{ = n} <1- J) | p(x) — q(x)| dx. 


Give examples in which the above inequality turns into equality. 
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Problem 3.9.15. Let X = (X„)n>0 and Y = (Yn)n>0 be any two random se- 
quences, defined on some probability space (2, ¥, P). Let t be a random moment, 
such that X,,(@) = Y,,(@) for all n > t(@) (the moment t is sometimes referred to 
as the coupling time). Letting P, and Q, denote the probability distributions of the 
variables X,, and Y,,, prove the coupling inequality: 


1 
zl Pn = Qn || < Piz = n}. 
Problem 3.9.16. Let f = f(x) and g = g(x) be any two probability densities on 


(R, A(R)). Prove that: 
(a) l Lf) — e(x)|dx =2 J COR ax =2 l (g(x) — ft ae: 


(b) (/ FS (x)g(x) dx) < 2 f mintr) dx; 


OS IF — gQ)| dx < /2K(f-g), where K(f,g) = f f(x)In ay dx is 
Kullback’s information (see Problem 3.9.3) and the probability distribution P,, 
associated with the density f, is assumed to be absolutely continuous with respect 


to the distribution P,, associated with the density g; 


wf min( f(x), g(x)) dx > see), 


Problem 3.9.17. Suppose that the random vector X = (Xj,..., Xx) is uniformly 
distributed inside the set 


k 
i= x= (oha A el : 


i=l 


Prove that the probability density of the random vector X is given by the formula 
f(xy=k!, x € Ty. 


Problem 3.9.18. Let X and Y be any two random variables with EX? < oo and 
EY? < œ, let cov(X¥,Y) = E(X — EX)(Y — EY), and let F(x,y), Fi(x), 
and F>(y) denote, respectively, the distribution functions of the random elements 
(X,Y), X, and Y. 

Prove the Hoeffding formula: 


cov(X, Y) = I (F(x, y) — Fi(x)Fz(y)) dx dy. 
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3.10 Contiguity (Proximity) and Full Asymptotic Separation 
of Probability Measures 


Problem 3.10.1. Let P” = P} x- -x P; and P" = P"x. x P",n > 1, where Pi 


n? 


and P? are Gaussian measures with parameters (aù, 1) and a, 1). Find conditions 
for (a ') and (aj) that ensure the relations (P") < (P”) and (pr ) < (P”). 
Hint. Use direct calculation to show that 


n pn a(l — a) “ n œn 
H(a; P", P") = exp | = —— Lia - ay 
k=1 


and take into account the relations [ P §3.10, (11) and (12)]. 


Problem 3.10.2. Let P” = P} x--- x P and P" = P! x -xX P”, where P} 
and P? are probability measures on n (R, BR), such that Pr(dx) = = lpn) da 
and Pr(dx) = = Ia,,14a,|(x) dx, for some choice of 0 < a, < 1. Prove that 


H(a; P}, P?) = = ] — an and 
(P") < (P) <> (P") < (P") 4 lim nan = 0, 
(P")A(P") <=> limna, = 00. 

Problem 3.10.3. Consider the structure (Q, F, (Fa)n>0), which consists of a 


measurable space (2, ¥) and a flow of o-algebras (Fn )n>0, chosen so that Fo C 
F, CT- C F. Set F = o(\) Fa), suppose that P and P are two probability 


measures on co F), and denote by P, = P|.F,, and P, = PIF, their respective 
restrictions to ¥,. Prove that 


(Pa) < (Pr) <= P «KP, 
(P,) <a> (P) —> PaP, 
(P,) (Pr) —> PLP. 


Problem 3.10.4. Consider the probability space (2, 4%,P), in which Q = 
{-1,1}°° is the space of binary sequences œw = (@1,@2,...) and the probability 


measure P is chosen so that P{w : (a@,..., On) = (dj,...,@n)} = 2”, for every 

i = £1,i = 1,...,n. Given any n > 1, let e (w) = œn. (In particular, under the 
measure P, the sequence € = (€1, &2,...) is a sequence of independent Bernoulli 
random variables with P{e, = 1} = P{e, = —1} = 5.) 


Next, define the sequence S = (Sn)n>0 according to the recursive rule Sọ = 1 
and Sn = S,—1(1 + pn), where py = HUn + On€n, On > O, Un > On — 1. (In the 
context of financial mathematics the random variable S, > 0 is usually interpreted 
as “the price” of a given security in period n—see [P §7.11].) 
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Let P” = P|.F¥,, where Fa = o(€1,...,&,). On the probability space (2, F) 
one can define a new measure P in such a way that under P the random variables 
€1,€2,... are again independent, but also share the property 


D 1 Fm 1 n 
Plen =1}= 50 +b) Phen = —1} = 5(1— ba), where by = a, 


Prove that the sequence S = (Sn)n>0 forms a martingale relative to the measure P 
(see [P §1.11] and [P §7.1]). 
Setting P” = P|.¥,,, prove that 


He:P pry = [| [SP Cn DO) 


k=1 


Finally, by using [ P §3.10, Theorem 1] conclude that 


(P")<a(P") & Sob <ov. 
k=1 


(In the context of “large” financial markets the previous statement implies that 


2 
the condition 5°72, ( < œ is necessary and sufficient for the absence of 
k=1\ o y 


asymptotic arbitrage—for more details see § 3, Chap. VI, in the book [120].) 


Problem 3.10.5. Suppose that, unlike the security-pricing model discussed in the 
previous problem, one sets So = 1 and S, = elit.+hn n > 1, where hy = 
Hk + Ok£k, for some og > 0 and some sequence (£1, €2,...) of independent and 
identically distributed Gaussian (^ (0, 1)) random variables. 

With Fa = o(€1,...,€) and P” = P|.¥,,n > 1, prove that the sequence 
(Si)n>o forms a martingale (see_[P §7.11]) relative to the measure P, which is 
defined by P|.¥,, = P”, where dP” = z, dP” with 


Show also that 


= 1— n 2 
H(a: P”, P”) = Sais a or ( a) (= + *) ! 
l 2 Se 2 


and 


n 2 
Paf o D(H +e) <o 


k=1 
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2 
(The condition X`}; (# + %) guarantees the absence of asymptotic arbitrage 
in this financial market—see § 3c, Chap. VI, in the book [120].) 


3.11 Rate of Convergence in the Central Limit Theorem 


Problem 3.11.1. Prove the inequalities in [ P §3.11, (8)]. 


Problem 3.11.2. Let &,&,... be a sequence of independent and identically 
distributed random variables with E& = 0, D& = o° and E|? < oo. The 
following non-uniform estimate is well known: 


CEļġ |? l 


|F,(x) — B(x)| < on x (+ [xp 


forall — œ < x < œ. 


Prove this result at least in the case of Bernoulli random variables. (The statements 
in this problem and in Problems 3.11.5—3.11.7 bellow are discussed, for example, 
in the book [94].) 


Problem 3.11.3. Let (&)x>1 be a sequence of independent and identically dis- 
tributed random variables that take two values +1 with equal probability (1/2). 
Setting p2(t) = Ee’! = (el! + e~") and S = & +--+ + &, show, following 
Laplace, that 


1 f” 1 
PiS = 0} = — f CEE aa as n —> œ. 


Problem 3.11.4. Let (&)x>1 be a sequence of independent and identically dis- 
tributed random variables, taking 2a + 1 integer values 0, +1,..., +a, and set 
P2a+1(t) = Eei! = er (1+2)>°¢_, costk). 

Just as in the previous problem, prove—again, following Laplace—that 


1 m 
P{S, = 0} = =f Papit) dt ~ SNo as n —> œ. 
0 


V2m(a+ l)n 


In particular, for a = 1, i.e., in the special case where & takes the values —1, 0, 1, 
one must have 


P{S, =O} ~ ELA as n —> œ. 


24 mn 
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Problem 3.11.5. Prove that if F = F(x) and G = G(x) are two distribution 
functions, associated with two integer-valued random variables, and if f(t) and 
g(t) denote their respective characteristic functions, then 


fO- eW 


t 


dt. 


T 
T 


sup F(x) - Gx) = 5 f 


Problem 3.11.6. Prove that if F and G are two distribution functions, f(t) and 
g(t) are their respective characteristic functions, and L(F, G) is the Lévy distance 
between F and G (see Problem 3.1.4), then for every T > 2 one must have 


InT 


t 


1 F 
LF.G) < = f 


Problem 3.11.7. Let F,,(x) be the distribution function of the normalized sum 


= l 7 X; & of some finite collection of independent and identically distributed 


random variables, such that Eg; = 0, E£? = o? > 0 and Ejé;|? = p3 < oo. Setting 
Bs 
F,(x)— o(*) 
o 


p = %3 prove that 
3.12 Rate of Convergence in the Poisson Limit Theorem 


lim inf y/n 


n (a,c) 


Problem 3.12.1. Prove that with A; = —In(1— px) the variation distance || B( px) — 
IT(A,)|| satisfies the relation 


|| Bore) — Ax) || = 20 — e** — Age") (SAD) 


and, therefore, || B — M|] < Xg- Aj. 
Hint. The inequality || B(px) — I7(Ax)|| < Az follows from the formula 


|| B(x) — AADI =|. pe) — e**| + [pe — Ane | + 
Ne ko _ pk A 
+e" a Se ae) 
i=2 
and the fact that 2(1 — e~* — xe~*) < x?, for x > 0. 


Problem 3.12.2. Prove the relations [ P §3.12, (9) and (10)]. 
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Problem 3.12.3. Let &,...,§&, be independent Bernoulli random variables that 
take the values 1 and 0 with probabilities P{& = 1} = p, P{& = 0} = 1— px, 
1 < k <n. Given any 0 <t <1 and À > 0, set & = 0, 


[nt] 
Si) = Yk, 
k=0 


Ore 


PYO=PiSOQ =k, mO = 


k =0,1,2,..., 


and 
[nt] 


A=} Pe (= ES, 0). 


k=0 


Prove that the probabilities P (t) and x(t) satisfy the following relations: 
t 
P(t) =1- / Pi” (s—) dAn(s), 
0 


PMO) = -f [PO 6- — PRs) dAn(s), k21, 


ott) = | my(s—) d(As), 

: (40%) 
mit) =f [mo-)—ma]d0s), k> 

0 


Problem 3.12.4. By using the relations (+) and (**) in the previous problem, prove 
that 


lee) t œ 
PPO -mO| <2 f NO [PP (s—) — mr (s—)| ds) 
k=0 0 k=0 
+(2 + 4A, (t)) max |An(s)— As]. (***) 
O<s<t 
Problem 3.12.5. By using the Gronwall—Bellman inequality (see Problem 2.6.51) 


and the notation adopted in Problem 3.12.3, conclude from (***) that 


[0.0] 
So PLO) = te] < e™ + (2 + 44n() max |A,(s) — As]. 
ven O<s<t 
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Then conclude from the last relation that 


Las] 


5 Pix — Às 


k=0 


> 


0<s<1 


5 |P{S,(1) =k}—- mx (1)| = (2 + 4y pi )e” min sup 
k=0 k=1 


where the min is taken with respect to all permutations i = (i1,...,in) of the 
numbers (1,...,7) and py = 0. 
By using the above inequality, prove that if }°7_, py = A then 


Las] 


XPa —As 


k=0 


pl 8 = |a 


where C(A) = (2 + 4A)e”*. 


< C(A) min sup 


t 0<s<1 


e s 
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Problem 3.13.1. Prove formula [ P §3.13, (18)]. 

Problem 3.13.2. By using the notation adopted in [P §3.13, 4], prove that the 
convergence P\) 5P (in (D, 2, p)) implies the convergence f(X) x F(X). 
Problem 3.13.3. Verify the implication [ P §3.13, (22)]. 


Problem 3.13.4. Let &,&... and 71,72,... be two sequences of independent 
and identically distributed random variables with continuous distribution functions, 
respectively, F = F(x) and G = G(x). Consider the empirical distribution 
functions 


Lee LA 
Fy (x; œ) = W X Elow) <x) and Gy(x;@) = N CO) <x) 
k=1 k=1 


and set 
Dyu (0) = sup |Fy (x; @) — Gu (x; 0)| 


and 
D$ u (©) = sup(Fy (x; w) — Gu (x: 0)). 


In the case of two samples, of the type described above, it is well known that 


NM 
lim P| vane Dym (w) < zi = K(y), y>0, (*) 


where K = K(y) denotes the Kolmogorov distribution (see [ P §3.13, 5]). 
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By following the ideas on which the proof of [ P §3.13, (25)] is based, sketch the 
main steps in the proof of (+) and the proof of [ P §3.13, (27) and (28)]. 


Problem 3.13.5. Consider the “omega-square statistics” 


o2(w) = f | Fy (2:0) — FO dF(x), (+x) 


associated with the continuous distribution function F = F(x). Prove that, 
similarly to the statistics D y (œ) and D (w), the distribution of the statistics 4 (œ) 
is one and the same for all continuous distribution functions F = F(x). Show also 
that 


1 4N —3 


Problem 3.13.6. Let &, &,... be a sequence of independent and identically dis- 
tributed random variables, chosen so that Eé; = 0 and Dé; = 1. Setting 


Ay = max (Se= 7S) = min (S = Ss) 


k<n 
prove that 

Bn d š 

= max |B; — tBı| — min |B; — t Bı | 
Jn {t=k/n:k=0,1,....n} {t=k/n:k=0,1,....n} 

d . 

= max BP — min BP, 

{t=k/n:k=0,1,....n} {t=k/n:k=0,1,....n} 


where B = (B,);<; is a Brownian motion, B° = (B?);<) is a Brownian bridge and, 
d : ee re 
as usual, “=” stands for “identity in distribution.” 
Show also that 


2 


E%,~ Fn and DZ, ~ (= — 5) n, 


(Comp. with Problem 2.13.48.) 


Problem 3.13.7. Let F = F(x) and G = G(x) be any two distribution functions, 
let F~'(t) = inff{x : F(x) > t} and G7! (t) = inf{x : G(x) > t}, let §2 stand for 
the space of all distribution functions F with Is x? dF(x) < œ, and let 


1 1/2 
do(F,G) = (J IFA) -6-Pat) . FG Ss, 
0 


(a) Prove that the function dy = d2(F, G), which is known as the Wasserstein 
metric, is indeed a metric and the space {2 is complete for the metric d). 
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(b) Let &,...,&, be independent and identically distributed random variables, 
which share one and the same distribution function F € 2, and let F„ be the 
empirical distribution function associated with the sample €,,..., &,. Prove that one 


has (P-a. e.) PS 
da(F,F,)—>0 as n> oœ. 


(c) Prove that for any F, G € §2 the following (coupling-type) relation is in force 
d(F, G) = inf EE — n)’, 


where the inf is taken over all possible pairs of random variables (€, 7), chosen so 
that € has distribution function F € §2 and 7 has distribution function G € §2. 


Problem 3.13.8. Let ı stand for the space of all distribution functions F with 
[oe |x| dF(x) < 00 and let 


1 
a(F.G) = | IF '@)-Gl@|dt, F,Geh. 
0 


(a) Prove that the function dı = dı(F, G), which is known as the Dobrushin 
metric, is indeed a metric and the space § is complete for the metric d1. 

(b) Prove that if F, Fi, Fo,... € §1, then dı (F, Fa) > Oas n — oo if and only if 
F, > F and f |x|dF,(x) > f |x| dF (x) (the symbol “=” stands for “converges 
essentially” —see [P §3.1]). 

(c) Prove that for any F, G € § the following (coupling-type) relation is in force 


dı(F, G) = inf E|§ — n|, 


where the infimum is taken over all possible pairs of random variables (£, n), chosen 
so that € has distribution function F € § and 7 has distribution function G € §. 

(d) Let &,...,&, be independent and identically distributed random variables, 
which share one and the same distribution function F € Jı and let F, is the 
empirical distribution function associated with the sample £1, . . . , &,. Prove that one 
has (P-a. e.) 


dı (F, Fn) >0 as n>. 


Problem 3.13.9. Let F = F(x), x € R, be the distribution function of some 
random variable X and let F7! = F7! (u), u € [0, 1], be the inverse of F, as defined 
in Problem 3.8.8. Given any 0 < p < 1, the quantity x, = F™!(p) is known 
as the p-quantile of the random variable X, or, equivalently, of the distribution 
function F = F(x). (The quantity F7! (1/2) is often referred as the “median,” 
while F7! (1/4) and F7! (3/4) are commonly referred to, respectively, as the “lower 
quantile” and the “upper quantile”.) 

Give the conditions under which the p-quantile x, can be characterized as the 
unique root of the equation F(x) = p. 
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Problem 3.13.10. Let X1,...,X,,... be independent and identically distributed 
random variables, which share one and the same distribution function F = F(x), 
and let 


F,,(x) = Fi, (x; w) = 15 (Xe (0) = x) 
k=1 


be the empirical distribution function constructed from the sample X),..., Xn (see 
formula [ P §3.13, (1)]). 

Prove that if £ w; ...,X are the ordered statistics constructed from the 
observations X,,..., X, (in Problems 1.12.8 and 2.8.19 these statistics are denoted 
by X n) ,..., Xi”), then the empirical distribution function F„, = F, (x) admits the 
following representation: 


0, ifx< 8”, 
F(x) = k/n, fee Seen) k=1,..., n—1, 
1, ifx> ¥”. 


Problem 3.13.11. Let everything be as in the previous problem, let x, be the p- 
quantile of the distribution function F = F(x) and let %,(n) = Fo (p) be the 


p-quantile of the distribution F, = F, (x). Prove that if 2, is the unique value with 
the property F(xp—) < p < F(xp), then as n — oo one has 


%p(n)—> xp (P-a.e.). 


Hint. Notice that £, (n) = X ie ] and convince yourself that for every £ > 0 one 
has 
mn Vo) Tm y”) 
Plim? t > Xp — ô} = PÍ lim X 7,5] < Hp ô} 2k 
where, just as before, [x] stands for the smallest integer that is greater than or equal 
to x. 


Problem 3.13.12. Let X1, X2,... be independent and identically distributed ran- 
dom variables that share one and the same continuous distribution function F = 
F(x). In addition, suppose that the following conditions hold: for any given 0 < 
p < 1 the equation F(x) = p has unique solution xp; the derivative F’(x) exists 


and is continuous at the point x = xp and, furthermore, F’(xp) > 0. Let x ne ] 
denote the p-quantile in the sample. 
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Prove that as n —> oo the random variables /n (X ae — Xp) converge in 
distribution to a Gaussian random variable N that has zero mean and dispersion 


PU — p)(F'(%p)), i.e., 
s(n law 
(Èa xp) SN. 


Hint. Suppose that the random variables &,...,&, are independent and uni- 
formly distributed in the interval [0, 1], and let En, ane EW denote the associated 
ordered statistics. In order to prove the required statement, notice first that the 
variables £ 17 %p and F KON ) — F~!(p) coincide in distribution, and then 
use the statement in [P §3.13, Lemma 2] and the Central Limit Theorem in terms 
of Lindeberg’s conditions (see [ P §3.4, Theorem 1]). 


Chapter 4 
Sequences and Sums of Independent Random 
Variables 


4.1 0-1 Laws 


Problem 4.1.1. Prove the Corollary to Theorem 1 in [P §4.1]. 
Hint. Use the fact that the distribution function of the variable ņ takes only the 
values 0 and 1. 


Problem 4.1.2. Prove that if (&:)n>1 is some sequence of independent random 
variables, then the random variables lim £n, and lim é, are degenerate (i.e., have 
vanishing dispersion). 

Hint. Show first that lim én and lim é, are 2 -measurable, where 2 is the 
associated tail o-algebra 


Problem 4.1.3. Let (&,)n>1 be any sequence of independent random variables, 
let S, = & +... + én and suppose that the constants b, are chosen so that 
0 < b, t œœ. Prove that the random variables lim A and lim oa are degenerate (i.e., 
have vanishing dispersion). 

Hint. Fix some integer N in the set {1,2,...} and set 


~~ 0, n<N, 
" S,—-Sv, n>N. 


By using the property lim, Sa = lim, Sa conclude that the variable lim, Şa must 
be measurable for NZ N 7n and, therefore, since N is arbitrarily chosen, must be 
measurable also for 2. 


Problem 4.1.4. Let S, = & +...+&,n > 1, and 2(S) = () F°(S), 
FO (S) = ofa: Sn, Sn41,...}. Prove that all events in the tail o-algebra X (S) 
are trivial. 


Problem 4.1.5. Let (En)n>1 be any sequence of random variables. Prove that 
{lim &, > c} D lim{é, > c} for every constant c. 


A.N. Shiryaev, Problems in Probability, Problem Books in Mathematics, 233 
DOI 10.1007/978-1-4614-3688-1_4, 
© Springer Science+Business Media New York 2012 


234 4 Sequences and Sums of Independent Random Variables 


Hint. It is enough to notice that 


lim{é, > c} = {w : &,(@) > c i.o}. 
Problem 4.1.6. Give examples of tail events A (i.e., events in the o-algebra 2 = 
NA FL, where F = o (tn, &n+1,...), for some sequence of random variables 
(E;)n>1) that have the property 0 < P(A) < 1. 


Problem 4.1.7. Let £1, &,... be any sequence of independent random variables 
with Eé, = 0 and EE? = 1, n > 1, for which the central limit theorem holds, i.e., 
P{S,//n < x} > (x), x € R, where S, = & +... + &,). Prove that 


lim n! S, = +00 (P-a.e.). 
noo 


In particular, this property must hold when &), &, . . . are independent and identically 
distributed with Eé; = 0 and E£? = 1. 


Problem 4.1.8. Let &,&,... be any sequence of independent and identically 
distributed random variables with E|&,| > 0. Prove that 


lim |) & 


=+o00 (P-a.e.). 


Problem 4.1.9. Let &,&,... be any sequence of independent and identically 
distributed random variables with Eé; = 0 and E|é,| > O and let S, = & +... + En. 
Prove that (P-a. e.) 


lim n`! S, = +œ and lim n7!/?S, =—oo. 
n=>00 n—> o0 


(Comp. with the statements in Theorem 2 and Problem 4.1.7.) 


Problem 4.1.10. Let Fi, F2,... be any sequence of independent o-algebras and 
let? = NX; o( U jou F i): Prove that every set G € & satisfies the “zero-one” 
law: P(G) is either 0 or 1. 


Problem 4.1.11. Let A), Az,... be some sequence of independent random events, 
chosen so that P(A,,) < 1,2 > 1, and P (JFL, An) = 1. Show that P(lim A,,) = 1. 


n=l 


Problem 4.1.12. Let A;, A2,... be any sequence of independent random events 
and let p, = P(A), n > 1. The “zero-one” law implies that the probabilities 
Plim An) and P(lim A,,) must equal either zero or one. Give conditions, expressed 
in terms of the probabilities p,, n > 1, which guarantee that: (a) P(lim A,,) = 0; 
(b) P(lim A,,) = 1; (c) P(lim A,n) = 0; and (d) P(lim 4,) =1. 


Problem 4.1.13. Let £1, &,... be any sequence of non-degenerate and identically 
distributed random variables and let S, = & + ... + En. Prove that: 
(a) P{S, € Ai.o.} = 0 or 1 for every Borel set A € @(R). 
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__(b) Only the following two relations are possible: either lim S, = 00 (P-a. e.), or 
lim S, = —oo (P-a. e.); furthermore, 


— 2i 
P{iim S, = oo} = 1, if do =PLSn > 0} = 00, 


n=1 


—_ X 1 
P{lim S, =—oo} = 1, if > —P{S,, > 0} < oo. 
n 


n=1 


(c) If the distribution of the variables £, is symmetric, then lim S, = oo and 
lim S, = —oo (P-a. e.). 


Problem 4.1.14. According to the corollary to [ P §4.1, Theorem 1], every random 
variable 7, which is measurable for the tail o-algebra X, associated with some 
sequence of independent (say, relative to some measure P) random variables 
&,,&,..., must be constant P-a. e., i.e., P{7 = Cp} = 1, for some constant Cp. 
Let Q be another probability measure, relative to which the variables £, &,... are 
also independent. Then it must be the case that Q{n = Co} = 1, for some constant 
Co. Can one claim that the constant Cg must coincide with the constant Cp? 


Problem 4.1.15. Let Sm = &+...+&n,m > 1, where £1, &,...is some sequence 
of independent Bernoulli random variables, such that P{é; = 1} = P{é; = —1} = 
1/2,i > 1. Let op = inf{n > 1 : S, = 0}, with the understanding that 09 = oo 
if S, A 0 for all n > 1. Prove that the random walk (Sm)m>0, which starts from 0 
(So = 0), is recurrent, in that P{a9 < co} = 1. By using this property argue that 
P{S, = 0i.0.}= 1. 

Hint. Use the result established in Problem 1.5.7, according to which 


1 2m 
P{S,...Som 4 0} = (3) Cy, forevery m > 1. 


Problem 4.1.16. Let &,&,... be any sequence of independent and identically 
distributed random variables with E|é;| < oo. Assuming that Eé; = 0 and setting 
Sa = & +... + En, n > 1, prove that one has (P-a. e.) 


lim |S,| < o0. 
noo 


Problem 4.1.17. Let X = (X1, X2,...) be any infinite sequence of exchangeable 
random variables (for the definition of “exchangeable,” see Problem 2.5.4), let 
Bn = O(Xn, Xn41,...) and let 2 = (), Zn be the “tail” o-algebra, associated 


with the sequence X. Prove that for every bounded Borel function g = g(x) one 
must have (P-a. e.) 


Ele (X)| 2] = Elg(%1) | 22]. 
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Show also that the random variables X1, X2,... are conditionally independent 
relative to the “tail” o-algebra 2’. 


Problem 4.1.18. Let (X1,..., Xn) be any Gaussian vector of exchangeable ran- 
dom variables. Prove that there is a vector (£1,..., €y), of independent standard 
normal random variables (e; ~ “N (0, 1)), for which one can write 


N 
y= atbe,+c) ei, l<n<QN, 


i=l 
for some choice of the constants a, b and c. 


Problem 4.1.19. Let (X1, X2,...) be any infinite Gaussian sequence of exchange- 
able random variables. Prove that one can find a sequence (€,€1,...), that 
consists of independent and identically distributed Gaussian random variables ¢; ~ 
AN (0,1), i > 0, so that 


law 
Xn = at+bengtcen,, n=l. 


Problem 4.1.20. Let &, &,... be any sequence of independent random variables 
with exponential distribution P{é; > x} = e~*, x > 0. Consider the event A, = 
{En > h(n)}, n > 1, where h(n) is any of the functions (c Inn), (Inn +c InInn), or 
(Inn + InInn +c InInInn). 

Prove that 


0, ife>1, 
P(A, io.) = 
l, ife<1. 


Hint. Use the Borel—Cantelli lemma. 


Problem 4.1.21. Let £1, &2,... be any sequence of independent and identically 
distributed Bernoulli random variables with P{é, = 1} = P{&, = 0} = 1/2, 
n > 1. Consider the the events 


Ay = {En41 =1,... » En-+[log, log, 7] = 1} > ned. 


(a) Prove that P(A, i.o.) = 1. 
Hint. Consider first the sequence the events Ayn, m > 2. 


(b) Calculate the probability P(B,, i.o. ), where 


B,, = {En41 = ne: , En flog, 7] = 1}, n> 2a 


Problem 4.1.22. Let A, A2, ... be some sequence of independent events and let 


n 


1 
w; lim — la <x}, XER. 


Bex = 


4.2 Convergence of Series of Random Variables 237 


Prove that for every x € R one has 


P(B<,.) = Oor 1. 


4.2 Convergence of Series of Random Variables 


Problem 4.2.1. Let £1, &,... be any sequence of independent random variables 
and let S, = & +...+ En. By using the “three series theorem” prove that: 

(a) If X & < oo (P-a.e.), then the series X- &, converges with Probability 1 if 
and only if the series Eé; /(|&| < 1) converges. 

(b) If the series J` &, converges (P-a. e.), then > 2 < co (P-a. e.), if and only if 


2 (EE, | I (lEn < 1))’ < Ow. 


Problem 4.2.2. Let &,&,... be any sequence of independent random variables. 
Prove that }> 2 < co (P-a. e.), if and only if 


2 
De | <œ. 


Hint. Use the “three series theorem” and notice that 


2 


re He] <o = PEORES DPI > 1} <o]. 


Problem 4.2.3. Let £, &2,... be any sequence of independent random variables. 
Prove that the following three conditions are equivalent: 

1. The series X` &, converges with Probability 1. 

2. Series J` £, converges in probability. 

3. Series )° £, converges in distribution. 

Hint. Consider proving the implications (1) = (3) (2) = (1). The first 
implication follows from [ P §2.10, Theorem 2]. The implication (3) = (2) can be 
proved by contradiction by using the Prokhorov Theorem. To prove the implication 
(2) = (1), show first that the following inequality holds for arbitrary m < n and 
C >Q: 


PÍ max [Sk — Sm > 2c} <2 max Pf{|S, — Skl > C}. 


m<k<n m<k<n 


If the series ` &, converges in probability, then for every € > 0 one can find an 
integer m € N = {1,2,...}, so that the following inequality holds for every n > m: 
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max P{|S, — Sx| > e} < e. 


m<k<n 


Finally, conclude from the last relation that the series J` é, converges with 
Probability 1. 


Problem 4.2.4. By providing appropriate examples, prove that, in general, 
in [P §4.2, Theorems 1 and 2] one cannot remove the uniform boundedness 
requirement, i.e., the condition: for every given n > 1 one has P{|&,| < cn} = 1 for 
some appropriate constant c, > 0. 

Hint. Consider the sequence of independent random variables £, &,... chosen 
so that 


2 1 
Pié, =0= 1- =, Pii =n} = Pfi = -n= 5., n=l. 
n n 


Problem 4.2.5. Let &,...,&, be independent and identically distributed random 
variables with Eé} = 0 and Eé? < œ, and let S = & +... + ék, k <n. 
Prove the following one-sided analog (due to A. V. Marshall) of the Kolmogorov’s 
inequality [P §4.2, (2)]: 


ES? 
e + ES?’ 


IV 
S 


P| max Sk > el < 


l<k<n 


Problem 4.2.6. Let £1, &,... be any sequence of random variables. Prove that if 
Ž,>1 Elén| < 00, then the series „>; & converges absolutely with Probability 1. 


Problem 4.2.7. Let &,&,... be any sequence of independent and symmetrically 
distributed random variables. Prove that 


(X ay A J < YLEN). 


Problem 4.2.8. Let £1, &,... be any sequence of independent random variables 
with finite second moments. Prove that the series $` £, converges in L? if and only 
if the series }_ E&, and }_ Dé, both converge. 


Problem 4.2.9. Let &,,&,... be any sequence of independent random variables 
and suppose that the series J` &, converges (P-a.e). Prove that the value of the 
sum ` é, does not depend on the order of summation (P-a.e.) if and only if 


IEG: [En | < 1)| < Ow. 


Problem 4.2.10. Let &, &,... be any sequence of independent random variables 
with Eé, = 0, > 1, and suppose that 
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> ELE? (El < D + linll > D] < 00. 


n=l 


Prove that the series $27} &, converges (P-a. e.). 


Problem 4.2.11. Let A1, A2,... be any sequence of independent events 
with P(A,,) > 0, n > 1, and suppose that }°°°., P(An) = oo. Show that 


n=1 


Ya) [PA 1 as n — oo (P-a. e.). 


j=l j=l 


Problem 4.2.12. Let &, &,... be any sequence of independent random variables 
with mean values Eé, and dispersions o7, chosen so that lim Eé, = c and 


n? 


00 4-2 — 
1 9, 7 = œ. Prove that 


n n 
Ej 1 
) “5 / ) = >c as n> oœ (P-a.e.). 
j=l oj J=] 


o 


Problem 4.2.13. Let &,&,... be any sequence of independent and identically 
exponentially distributed random variables, so that P{£, > x} = e~*, x > 0. 

Prove that if the positive numbers an, n > 1, are chosen so that the series 
Š ,>1 4 converges, then the series X`] nén converges with Probability 1 and 
also in L?-sense for every p > 1. 7 


Problem 4.2.14. Let (Ti, T2,...) be the moments of jumps for some Poisson 
process (see [P §7.10]) and let a € (0,1). Prove that the series )°°, i 
converges with Probability 1. 


Problem 4.2.15. Let £1, &,... be any sequence of independent random variables, 
chosen so that &, is uniformly distributed in [-+. 1], n > 1. Prove that (P-a.e.): 

(a) the series a En converges; 

(b) ae [En = +00. 

Hint. Use the two-series and three-series theorems of A. Khinchin and A. N. Kol- 
mogorov ([ P §4.2, Theorem 2] and [ P §4.2, Theorem 3]). 


Problem 4.2.16. The three-series theorem ([ P §4.2, Theorem 3]) guarantees that, 
if &,&,... is any sequence of independent random variables, then the series 
J „>1 én converges (P-a.e.), if one can find a constant c > 0, for which the 
following three series happen converge (with & = £n I(|E:| < c)): 


ae E n> a D i pa P{|én| > c}. 
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By using appropriate examples, prove that if any one of the above series fails to 
converge for some c > 0, then the convergence (P-a. e.) of the series }°. , & may 
not hold. 7 


Problem 4.2.17. Let &,&,... be any sequence of random variables, chosen so 
that $? El& |" < oo, for some r > 0. Prove that & — 0 asin — oo with 
Probability 1. 


Problem 4.2.18. Let £1, &,... be any sequence of independent Bernoulli random 
variables with P{& = 1} = P{& = —1} = 5, k > 1. Prove that the random 


variable $`? i is well defined (P-a.e.) and is uniformly distributed in [—1, 1]. 


Problem 4.2.19. Let &, &,... be any sequence of independent and symmetrically 
distributed random variables. Prove that the following conditions are equivalent: 


1. The series > &, converges with Probability 1. 
2. > E2 < 00, P-a. e. 
3. X EG Al) < œ. 


Problem 4.2.20. Let be any random variable and let E denote its symmetrization, 
i.e., E = & — &, where £ is independent of £ and has the same distribution as &. 
(We suppose that the probability space is sufficiently rich to support both £ and £.) 
Let u = u(E) denote the median of the random variable £, defined by max(P{é > 
u}, PLE < u} < 5 (comp. with Problem 1.4.23). Prove that for every a > 0 one 


P(lé — u| > a} < 2P{IE| > a} < 4P figl > St. 


Problem 4.2.21. Let £, &,... be any sequence of independent random variables 
with 
P{& =1;=2", PE, =O} =1-2™. 


Prove that the series )°°°, & converges with Probability 1 and the following 
relations hold: 


Pl ye = of =|[[aq-2")>0 


n=1 n=l 


and 


CO 


Pl yf: = 1l =D [0 


n=1 n=1 n=l 


Problem 4.2.22. Suppose that £, &,... is some sequence of independent random 
variables and let Sm = E1 + ... Em, M > 1. Prove Etemadi’s inequality: for every 
€ > 0 and every integer n > 1 one has 
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P{ max |Sm] > 4al < 4 max P{| Sm] > e}. 


l<m<n <m<n 


(This inequality may be used in the proof of the implication (2) = (1) in 
Problem 4.2.3.) 


Problem 4.2.23. Let &,...,§&, be independent random variables with E& = 0, 
chosen so that for any given h > 0 one has Ee”* < oo, k = 1,...,n. Setting 
Sk = & +... + &,1 < k < n, prove the exponential analog of Kolomogorov 
inequality: for every € > 0 one has 


pi max S, > e} < e Eels, 
l<k<n 


Hint. Just as in the proof of the (usual) Kolomogorov inequality, one must 
introduce the sets A = {maxj<z<, Sk > E€} and Ay = {S; < &, 1 <i <k-l1, Sk > 
€}, 1 < k <n, and, by using Jensen’s inequality, establish the following relations: 


n 
Ee’ > Eeh% I4 = X Ee™ I4, >... > e P(A). 
k=1 


Problem 4.2.24. Let Y be a random variable and let (Y;,),>1 be a sequence random 


i d d Ree 
variables, such that Y, — Y asn — oo (“—>” means convergence in distribution). In 
addition, suppose that {N;; t > 0} is some family of positive integer-valued random 


. . : P 
variables, which are independent of (Y,,)n>1, and are such that N; —> oo as t — oo. 


d 
Prove that Yy, —> Y ast —> oo. 
Hint. Use the method of characteristic functions. 


Problem 4.2.25. Let Y be a random variable, let (Y,)n>1 be some sequence of 
random variables, chosen so that 


Y,—2>Y as n—->oo(P-a.e.), 


and let {N;,t > 0} be some family of positive integer-valued random variables. 
(Unlike in Problem 4.2.24, the independence between (Y,,),>1 and {N;,t > 0} is no 
longer assumed.) 

Prove the following properties: 

(a) If N, — oo (P-a.e.), then Yy, > Y as t > oo P-a. e. 

(b) If N; > N (P-a.e.), then Yy, —> Yy as t —> oo, P-a.e. 

(c) If N; Ga oo, then Yy, 5 Y ast > œ. 

Hint. For the proof of (c), use the fact that a sequence that converges in 
probability must contain a sub-sequence that converges almost surely. 
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Problem 4.2.26. Let £1, &,... be any sequence of independent Bernoulli random 
variables with P{é, = +1} = 1/2,n > 1. Prove that the random variable X = 


pale fu is well defined and its distribution function admits a probability density. 


Problem 4.2.27. Let &, &,... be some sequence of independent Bernoulli random 
variables with P{é, = 0} = P{é&, = 1} = 1/2,n > 1. Leta, > 0,b, > 0, 
a, +b, = 1,n > 1, and let 


X, = arb. 


Prove that the following statements are equivalent: 

1. i Xn converges almost surely (i.e., limy Th, X, exists and does not 
vanish almost surely); 

2. A2 — Xn) converges almost surely; 

3. [FL]; anbn converges. 

Hint. To prove that (3) = (1), consider the quantities E In X, and D In X„ and 
use the Three-Series Theorem. 


Problem 4.2.28. Let &,&,... be any sequence of independent and identically 


distributed random variables with (Cauchy) density f(x) = ae x € R. Prove 


‘ R 
that there is no constant m for which the property 1 yee & —> m can hold. 


Problem 4.2.29. Let &,&,... be any sequence of independent and identically 
distributed random variables with Eé; = u and Dé; < oo. Prove that 


1 P 
Gz 5 &i; >u as now, 


N \<i<j<n 


where, as usual, C7 stands for the number of combinations n choose 2 (= n(n — 
1)/2). 

Problem 4.2.30. Let £1, &,... be any sequence of independent Bernoulli random 
variables with P{é, = 0} = P{&, = 1} = 1/2,n > 1. Given any n > 1, let 
Zn denote the length of the maximal block inside the set of values &,,..., En, that 
contains only 1’s. Prove that 


n 


im =1 (P-a.e.). 
n Inn 


Hint. Prove that with Probability 1 lim, Z,/Inn > 1 and lim, Z,/Inn < 1. 


Problem 4.2.31. Let &, &,... be any sequence of independent Bernoulli random 
variables with P{é, = 1} = pn and P{&, = 0} = 1 — pnn > 1. 

(a) Prove that if Vrei PkPk+1 < ©, then the series Da & 41 converges 
with Probability 1. 
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(b) Prove the Persi Diaconis Theorem: if p, = 1/n for any n > 1, then the 
random variable S = yy £,€) 41 has Poisson distribution of parameter A = 1. 


4.3 The Strong Law of Large Numbers 


Problem 4.3.1. Prove that E£? < 00 if and only if }°°2., nP{|&| > n} < oo. 
Hint. Prove that 


XO aP{lEl > n} < EE? < 1+4)  nP{l§| > n}. 


n=l n=1 


Problem 4.3.2. Assuming that &,&,... is some sequence of independent and 
identically distributed random variables, prove the Marcinkiewicz—Zygmund strong 


law of large numbers: if E|&|* < oo, for some 0 < a < 1, then i — 0 (P-a. e.), 


and if E|é,|? < oo for some 1 < $ < 2, then Sure — 0 (P-a. e.). 


Problem 4.3.3. Let &,&,... be any sequence of independent and identically 
distributed random variables with E|&;| = oo. Prove that the following relation 
holds for any sequence of real numbers (an)n>1: 


aac Sn 
lim | — — an 
ain 


=oo (P-a.e.). 


Problem 4.3.4. Can one claim that all rational numbers in the interval [0, 1) are 
normal, in the context of Example 2 in [P §4.3, 4]? 


Problem 4.3.5. Consider the decimal expansions œ = 0.@)@2... of the num- 
bers w € [0, 1). 

(a) Formulate the decimal-expansions analog of the strong law of large numbers, 
formulated in [ P §4.3, 4] for binary expansions. 

(b) In terms of decimal expansions, are the rational numbers normal, in the sense 
that + ra L&E lo) =i) > b (P-a.e.)as n —> oo, for any i = 0,1,...,9? 

(c) Prove the Champernowne’s proposition: the number 


æ = 0.123456789101112..., 
where the (decimal) expansion consists of all positive integers (written as decimals) 


arranged in an increasing order, is normal, as a decimal expansion—see [P §4.3, 
Example 2]. 
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Problem 4.3.6. (N. Etemadi) Prove that the statement in [P §4.3, Theorem 3] 
remains in force even if the “independence” of the random variables &, &,... is 
replaced with “pairwise independence”. 


Problem 4.3.7. Prove that under the conditions in [P §4.3, Theorem 3] one can 


also claim convergence in the first-order mean: E| Be m| > 0asn > œ. 


n 
n 


Problem 4.3.8. Let &, &,... be independent and identically distributed random 
variables with E|&,|? < oo. Prove that 


1 
nP{lé&| > e/n} > 0 and — max |€;| LA 0. 
Jn k<n 
(Comp. with Problem 2.10.41.) 


Problem 4.3.9. Construct a sequence of independent random variables 


El, &,... 3 


with the property that lim,,— 9 a (€&; +--+ + £n) exists as a “limit in probability” but 


n 
not as a “limit almost surely”. 


Hint. Consider the independent random variables £, &,..., chosen so that 


Pé, = 0} = 1- P{é, = +n} = 


ninn’ 2nlnn` 


By using the second Borel-Cantelli lemma, in conjunction with the fact that ES? < 
n?/ Inn and that $°, P{|é,| > n} = 1, conclude that P{|&,| > n io. } = 1. 


n=1 


Problem 4.3.10. Let &,&,... be a sequence of independent random variables, 
chosen so that P{&, = +n“} = 1/2. Prove that this sequence satisfies the strong 
law of large numbers if and only if a < 1/2. 


Problem 4.3.11. Prove that the Kolmogorov strong law of large numbers 
(Theorem 3) can be formulated in the following equivalent form: for any sequence 
of independent and identically distributed random variables &), &,... one has 
Elf&\|<0oo = > n'S, > E& (P-a.e.), 
Ej&| = oo 4 limn!S, =+00 (P-a.e.). 


In addition, prove that the first relation remains valid even if “independent” is 
replaced by “pair-wise independent”. 
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Problem 4.3.12. Let &,&,... be independent and identically distributed random 
variables. Prove that 


En 


Esup Z| < œo <=> Ejg,|Int|&| < oo. 
nin 


Problem 4.3.13. Let £1, &,... be independent and identically distributed random 
variables and let S, = & +--+ + &, > 1. Prove that for any given œ € (0, 1/2] 
one of the following properties holds: 

(a) n™® S, — œo (P-a. e.); 

(b) n~* S, — —oo (P-a. e.); 

(c) limn ™® S, = co, lim n™® S, = —oo (P-a. e.). 


Problem 4.3.14. Let £, &2,... be independent and identically distributed random 
variables and let Sọ = 0 and S, = & +... + En, n > 1. Prove that: 
(a) If € > 0 then 


XO PSs >ne} <œ => Eg =0, E? Jae 


n=1 


(b) If EE, < 0, then for any p > 1 one has 


E(sup:S,)? =< œ 4 EG)? < o. 


n>0 


(c) If E&i = Oand1 < p < 2, then there is a constant Cp, for which the 
following relations are in force: 


5 Pf max S = nh < GEk, S Pmax se > n} < 2C,Elġi |’. 


k<n 
n=1 n=1 


(d) If E£; = 0, E£? < œo and M(e) = SUP, >0(Sn — ne), € > 0, then 
lim eM(e) = 07/2. 
8 > co 


Problem 4.3.15. (On [ P §4.3, Theorem 2].) Let £1, &,... be independent random 
variables, chosen so that 


1 
PLEn = 1} = Pié, = —1} = a =2 y 


Pig = 2" = PLEn = —2"} = oe 
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Prove that yon Dia = œ (comp. with [P §4.3, (3)]), but nevertheless one has 
(P-a. e.) 


&o+...+&, 0. 
n 


i.e., the strong law of large numbers holds, in that [ P §4.3, (4)] holds (notice that 
Eé, = 0,n > 1). 


Problem 4.3.16. Let &,&,... be independent and identically distributed random 
variables with E|&;| = oo. Prove that at least one of the following two properties 
must be satisfied: 


SER 1 n 
P ii = = 1 P 
| im : D5 +00 or 


n 


1 n 
li = -œl =1. 
lim 26h of 


Problem 4.3.17. As a generalization of the Kolomogorov strong law of large 
numbers [ P §4.3, Theorem 2] prove the following result, which is due to M. Loéve: 
if £1, &,... are independent random variables, chosen so that 


nen 
n=l 


where 0 < a, < 2, and, moreover, E£, = 0 for 1 < œn < 2, then + X] & > 0 


— = n 


almost everywhere. 


Problem 4.3.18. Give an example of a sequence &), &,... of independent random 
variables such that E&, = 0,n > 1, and 


L 5 & = —œ (P-a.e.). 
n 


i=l 


Hint. Choose, for example, the random variables &, so that P{é&, = —n} = 
1—n~? and Pfé, = n? — n} =n, n> 1. 


Problem 4.3.19. Let £1, &,... be any sequence of independent random variables, 
such that E&, = 0, k > 1. Setting 


m Jér if l&| <n, 
g 0, if |&|>n, 


prove the following version of the law of large numbers, which is due to A. N. Kol- 
mogorov: in order to claim that 


n 
1 P 
a &k>0 anow, 
k=1 
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it is necessary and sufficient that the following relations hold as n —> oo, 


XO Pil&| > n} > 0, 


k=1 


1 n 
= > Ee”) > 0, 
n 

k=1 


1 n 
(n) 
k=1 


By using appropriate examples, prove that the last condition (as a necessary 
condition) cannot be replaced by the condition 


1 n 3 
= LE yr > 0. 
k=1 


Problem 4.3.20. Let N = (N;):>0 be the renewal process from Example 4 in 
[P §4.3, 4], i.e., N; = or I(T, < t), where Tp = 01 +... + On and (On)n>1 is 
some sequence of independent and identically distributed random variables, chosen 
so that Eo; = u, 0 < u < œ. By the strong law of large numbers, one has ta > E 


as t — oo (P-a. e.). Prove that 
N,\" 1 
e(*) —> — ast—o, forevery r>0, 
HW 


and notice that these results remain valid even with y = œœ, in which case 1/j = 0. 


Problem 4.3.21. Let £1, &2,... be any sequence of independent and identically 
distributed random variables, set S, = & +... + En, and let {N;,t > 0} be any 
family of random variables that take values in the set {1,2,...} and are chosen so 
that N, — oo as t > oo, (P-a.e.). 


Prove that: 
(a) If EJE |" < co, r > 0, then 
EN, 
-—>0 ast—>c (P-a.e.), 
(N) 


and if, moreover, N,/t — A (P-a. e.), for some 0 < A < ov, then 


-—>0 astźt—oœo (P-a.e.). 
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(b) If EJ&1|" < oo for some 0 < r < 2, with the understanding that EE; = 0 if 
1 <r <2, then 


>0 ast—+oo (P-a.e.). 


and if, in addition, N;/t — A (P-a.e.), for some 0 < A < oo, then 


—>0 ast—>oo (P-a.e.). 


(c) If E|&:| < œ and Eé; = u, then 


Sy, 


Moe as t > co (P-a.e.). 


and if, in addition, N,/t — A (P-a.e.), where 0 < A < oo, then 


PN yA as t—> co (P-a.e.). 


Hint. To prove (a), use the Borel—Cantelli lemma, in conjunction with the result 
established in Problem 4.2.25(a). To prove (b), use Marcinkiewicz-Zygmund’s 
strong law of large numbers, established in Problem 4.3.2. To prove (c), use 
Kolmogorov’s strong law of large numbers [P §4.3, Theorem 3] and recall the 
statement in Problem 4.2.25(a). 


Problem 4.3.22. Let f = f(x) be any bounded and continuous function defined 
on (0, co). Prove that for every a > 0 and every x > 0 one must have 


oo k 
Jim (e+ eG = 1+ 0), 
k=1 ; 


Problem 4.3.23. Let &,&,... be independent and identically distributed random 
variables, chosen so that E|&| < oo and Eg; = wy. Prove that as n — oo one has: 


(b) a = A —> u (P-ae.), foranyO<a <1. 
k=1 


4.4 The Law of the Iterated Logarithm 249 
4.4 The Law of the Iterated Logarithm 


Problem 4.4.1. Let &,&,... be any sequence of independent and identically 
distributed random variables with &, ~ -/ (0, 1). Prove that: 


(a) P} i = af = 


0, if >>, Pl& > an} < o0, 


P n n i.0.} = 
(b) Cema jo if >, Pl& > an} = 00. 


Hint. (a) Given some fixed c > 0 and setting A, = {En > cV2Inn}, by using 
[P §4.4, (10)] one can show that 


2 
=¢ 


n 
cv 4a Inn f 


The required statement then follows from the Borel-Cantelli Lemma (X P(A;,) < 
oo for c > 1 and )>P(A,) = œ for 0 < c < 1), in conjunction with the 
implications [ P §4.4, (3) and (4)]. 


P(A,) ~ 


Problem 4.4.2. Let &, &,... be any sequence of independent random variables, 
which are identically distributed with Poisson law of parameter A > 0. Prove that 
(independently of A) one has 


En InInn 


P į lim =];= 1 


lnn 
Hint. Consider the event A, = {En > cg,}, where c > 0 and g, = a, 
notice that X` P(A,,) < oo for c > 1, and $` P(A,,) = œ for 0 < c < 1. Then use 


the Borel—Cantelli Lemma and the implications [ P §4.4, (3) and (4)]. 


Problem 4.4.3. Let &, &,... be a sequence of independent and identically dis- 
tributed random variables with 


Ee"! = eI" Qca <2 
(comp. with [ P §3.6, 4]). Prove that 


Sn 
ni/a 


at, 
Pim InInn = evel = 1. 
Problem 4.4.4. Let &,&,... be any sequence of Bernoulli random variables with 


Pfé, = +1} = 1/2 and let S, = & +... + En. Prove the following result, which 
is due to G. H. Hardy and J. E. Littlewood: 


Sn 
lim [Sal 


———— <1 with Probability 1. 
n /2nInn 7 
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Hint. By showing that 
P{S, >a}< e "Ee": fora>0,h>0, 


and that coshh < exp {x}, conclude that 
a 
P{S, > a} < apl- : 
2n 


Finally, set a = 1 + £, € > 0, and use the Borel—Cantelli Lemma. (See also 
the bibliographical notes for [P Chap.4] and [P Chap. 7] at the end of the book 
“Probability-2”). 


Problem 4.4.5. Prove the following generalization of the inequality [ P §4.4, (9)]. 
Let &,..., & be independent random variables and set Sọ = 0 and Sk = & +... + 
Ek, k < n. Then for every real a one has (Lévy’s inequality): 


Pf max stay, — Sols a} < 2P{S, > a}, 


where u (€) stands for the median of the random variable £, i.e., the constant defined 
by the relation 


max(P{E > EPE < HED <5. 


(For the various definitions of the notion of “median,” see Problem 1.4.23.) 
Hint. Let 
t = inf{0 <k <n: Sk + w(S, — Sk) > at, 


with the understanding that inf @ = n + 1, and prove that 


i 1 
P{S, >a} > 72i =k} = 5P{ max [Sk — (Sn — Se] > a} i 


0<k<n 


Problem 4.4.6. Let &,...,&, be independent random variables with EE; = 0, 1 < 
i <n andlet S, = & + ... + £k. Prove that 


P| max S > a} < 2P{S, >e—E|S,|} foralla > 0. 


l<k<n 
Problem 4.4.7. Let &,...,&, be independent and identically distributed random 
variables, such that EE; = 0,07 = E£? < œ, and |é; | < c (P-a.e.), i < n. Setting 


Sn = & +-+- + &, prove that 


Ee*S < exp{27!nx’o? (1 + xc)} for every 0 < x < 2c7!. 
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Prove under the same assumptions that if (a„) is some sequence of real numbers, 
chosen so that a,/./n — oo and a, = o(n) as n — oo, then for every £ > 0 and 
for all sufficiently large n one has 


a 
2n 


2 
P{S, > an} > al — mau +e). 


Problem 4.4.8. Let £,...,&, be independent and identically distributed random 
variables, such that Eé; = 0 and |&;| < c (P-a.e.), i < n. Setting Sn = Ei +...+&, 
and D, = )~;_, Dé), prove the Prokhorov inequality: 


ac 
2Dn 


a 
P{S, >a} < ap] =r arcsin , aER. 
ë 


Problem 4.4.9. Let &,&,... be any sequence of independent and identically 
distributed random variables, such that EE, |* = oo, for some a < 2. Prove that 


Tm Je! 
ni/a 


=oo (P-a.s.) 


(and that, consequently, the law of the iterated logarithm does not hold for this 
particular sequence). 


Problem 4.4.10. Let &,&,... be any sequence of independent and identically 
distributed random variables with E£, = 0 and E£? = 1. Setting S, = & +... + én, 
n > 1, prove that with Probability 1 the collection of all limiting points of the 
sequence (=) coincides with the interval [—1, 1]. 

alninn / n>1 
Problem 4.4.11. Let &,&,... be any sequence of independent and identically 
distributed random variables, all having normal distribution “~ (m, o°). Setting 


n 

SD 1 

Mn = — ) E; 
n 


i=l 


and using the result in the previous problem, prove that with Probability 1 the 


collection of limiting points of the sequence (va =) coincides with the 
nmininn) n>1 


interval [—o, o]. 


Problem 4.4.12. Let &,&,... be any sequence of independent and identically 
distributed random variables that share one and the same continuous distribution 
function F(x), x € R, and let 


1 n 
F(x; @) = -2 1&0) = x), xER, n>1, 
k=1 


be the associated sequence of empirical distribution functions. 
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Prove that with Probability 1 


ax Vit sup, F: o) - F(x) 
im 
n JV 2InInn 


Problem 4.4.13. Let &, &,... be any sequence of independent and identically dis- 
tributed random variables with exponential distribution, chosen so that P{§; > x} = 
e*, x > 0. By using the argument of the Borel—Cantelli lemma (see also 
Problem 4.1.20), prove that with Probability 1 


= sup y F(x)(1 — F(x)). 


= n ee n=l > 
lim Sn =1, lm ie =1, and lim 
Inn InInn InInInn 


En —Inn — lnlnn = 


—Ax 


How will this result change if P{é; > x} = e~**, x > 0, for some A > 0? 


Problem 4.4.14. Let everything be as in the previous Problem (with P{g > x} = 
e* x >0,A > 0). Setting M, = max(&,,...,&,), prove that 


lim Mn lim En 
im = lim 
Alnn Alnn 


(P-a. e.). 


Problem 4.4.15. Let &,...,§&, be independent random variables and set Sọ = 0 
and S = & +---+ &,k < n. Prove that: 
(a) (As a continuation of Problem 4.4.5) 


P{ max [Sk + (Sn = Si)| = a} < 2PE|S,| = a}, 


where u(€) stands for the median of the random variable £. 
(b) If €,,...,&, are identically distributed and symmetric, then 


1 — ePi) < P| max |&| > x} < 2P4|S,| > x}. 
l<k<n 


Problem 4.4.16. Let &,...,&, be independent random variables and set S% = £1 + 
we + ék, 1 < k <n. Prove the Skorokhod inequality: for every € > 0 one has 


P| max |x| = 2e} < inf PESn— Skl < e} P{ISa| = £). 


Hint. Consider the stopping time t = inf{l < k <n: |Sk| > 2e} (with 
the understanding that inf Ø = n + 1) and use the idea outlined in the hint for 
Problem 4.4.5. 


Problem 4.4.17. Let &,...,&, be some random variables and set S, = & +... + 
Ek, 1 < k < n. Prove that for every € > 0 one has 
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E 
P{ max |&| > e} < 2P| max |S;| > I, 
l<k<n l<k<n 2 
and if, furthermore, the random variables &,...,&, are independent and have 


symmetric distributions, then for every £ > 0 one has 


P{ max |é&| = e} < 2P{|S,1 = ÊN 


1<k<r 


4.5 Rate of Convergence in the Strong Law 
of Large Numbers 


Problem 4.5.1. Prove the inequalities [ P §4.5, (8) and (20)]. 
Hint. Set € = —& and convince yourself that 


H(a) = supļaà — y (à)] = H(-a). 


In addition, use the inequality [ P §4.5, (7)]. 


Problem 4.5.2. Consider the set A defined in [P §4.5, (5)] and verify the claim 
that in the interior of the set A the function y(A) is convex (in fact, strictly convex, 
if the random variable € is non-degenerate) and infinitely differentiable. 

Hint. Setting Ax = infje,A and A* = sup,e,A, prove that (under the 
assumption [ P §4.5, (3)]) 


—o0 < À <0<A* <œ. 


Then prove that the function (A) = Ee? is infinitely differentiable on the interval 
(Ax,4*). The convexity of the function y(A) = Ing(A) follows from the Hélder 
inequality. 


Problem 4.5.3. Assuming that the random variable £ is non-degenerate, prove that 
the function H (a) is differentiable on the entire real line and is also convex. 


Problem 4.5.4. Prove the following inversion formula for the Cramér transform: 
YA) = sup [Aa — H(a)], 


for all A, except, perhaps, at the endpoints of the the set A = {A: W(A) < oo}. 


Problem 4.5.5. Let S, = & +... + En, where &,...,&,, > 1, are assumed to be 
independent and identically distributed simple random variables with E£; < 0 and 
P{E, > 0} > 0. Let y(A) = Ee*#! and let inf, g(A) = p (0 < p < 1). 
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Prove the Chernoff theorem: 


1 
lim —InP{S, > 0} = In p. (x) 
n n 


Problem 4.5.6. By using (*), prove that in the Bernoullian case (i.e., when 
P{&; = 1} = p and P{& = 0} = q), with p < x < 1, one has 


1 
lim —InP{S, > nx} = —H(x), (**) 
non 


where (comp. with the notation in [ P §1.6]) 


1-x 
l= p 


H(x) =xIn~ +(1—x)In 
P 


Problem 4.5.7. Let £1, &,... be independent and identically distributed random 
variables with Eé; = 0 and Dé; = 1 and let S, = & +...+&,n > 1. Let 
(Xn)n>1 be some sequence of real numbers, chosen so that x, —> oo and WA —>0 
asn —> oo. 


Prove that 


2 
ad. ‘ 
P{S;, = Xn Jn} =g 2 (+yn) ; 
where Yn >> 0, n> 00. 


Problem 4.5.8. By using (**), conclude that in the Bernoullian case (i.e., when 
P{& = 1} = p and P{&, = 0} = q) one can claim that: 
(a) For p < x < 1 and for x, = n(x — p) one has 


Xn 


P{S, > np + Xn} = apl —nH(p+—)a+ va} (ee) 
n 
(b) For x, = a,./npq, with a, — oo and Vi — 0, one has 


x2 | 
“(1+ o(1))>. (kx) 
2npq 


P{S, > np + Xn} = apd — 
n 

Compare the relations (***) and (x*x*) and then compare these two relations with 

the respective results in [ P §1.6]. 


Problem 4.5.9. Let £, &,... be any sequence of independent random variables, all 
distributed according to the Cauchy law with density f(x) = x € R. Prove 
that 


1 
m(1+x?)? 


n n 1\<k<n 


1 
imp] — max & < "l =e, 
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Problem 4.5.10. Let &,&,... be independent and identically distributed random 
variables with E|&,| < oo. Prove that 


lim — E( max léxl) =0. 


l<k<n 


(Comp. with the statement in Problem 4.3.8.) 


Problem 4.5.11. Suppose that £ is some random variable chosen so that EE = 0 
anda < & < b, for some constants a and b. Show that the moment generating 
function of £ satisfies the relation 


Eelé < esa” for all h> 0. 


h 


Hint. Use the fact that the function x ~> e”* is convex. 


Problem 4.5.12. Let &,,...,&, be independent and identically distributed 
Bernoulli random variables with P{é; = 1} = p, P{&; = 0} = q, p + q = 1, and 
let S, = & +...+ En. Prove the Chernoff inequalities: for any x > 0 one has 


P{S, —np > nx} < eo 
P{|S, —np| > nx} < 26 
Hint. Just as in many of the following problems, here one must use the relation 


P{S, >y}< eh Eel Sn, y>0,h>=0, 


which is often referred to as the Bernstein inequality. 


Problem 4.5.13. Prove that, in the setting of the previous problem, the following 
stronger result, known as “the maximal inequalities” is in force: 


P| max (S, —kp) > nx} < ens 
l<k<n 
P{ max |S, —kp| = nx} < 2e, 
l<k<n 
Hint. Use the exponential analog of the Kolmogorov inequality 
pi max (S — kp) > e} < eT Egh(Sn—np) 
l<k<n 


(see Problem 4.2.23). 
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Problem 4.5.14. Let &,...,&, be independent (but not necessarily identically 
distributed) random variables with values in the interval [0,1] and let $, = & + 
... + én. Setting p = ES, and q = | — p, prove that, for every 0 < x < q, the 
following inequality is in force 


PLS, — ES, > nx} < enh) 


z p ptx q q—x 
a= n| (=) (4) | 


Hint. First, use the inequalities 


where 


e!” PLS, > y} < Ee!Sn = Ee!Sn-1 Eehén 
< Ee™-1(1 — p + pe”) <... < (1 — p + peh)", 
and then choose h > 0 accordingly. 


Problem 4.5.15. Let everything be as in the previous problem, prove the Ho- 
effding inequality, which is a generalization of the Chernoff inequality from 
Problem 4.5.12: for any x > 0 one has 


P{ S, E ES, > nx} < ex 
P{|S, —ES,| > nx} < 272, 


Hint. Use the result established in the previous problem and remark that 
Ya) < =2x?. 


Problem 4.5.16. Let &,...,&, be independent random variables with values in the 
interval [0, 1]. Prove that for every £ > 0 the following inequalities are in force: 


1 
P{S, < (1 —«)ES,} < exp |- 325 ; 


P{S, > (1+ e)ESy} < exp {—[(1 + 6) In(1 + £) = EJES} ( < e777). 


Hint. For the proof of the first inequality use the result from Problem 4.5.14 and 
remark that y(—xp) < —px?°/2,0 < x < 1. For the proof of the second inequality 
use the hint for Problem 4.5.14, which implies the relation 


P{S, — ES, > nx} < [Pa p + peh]. 
Problem 4.5.17. Let &,,...,§, be independent random variables, chosen so that 


ai < & < bi, for some constants a; and b;, i = 1,...,n. As a generalization of the 
Hoeffding inequality from Problem 4.5.15, prove that, for any x > 0, one has 
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Pii =EN aai ae ae, 


P£[Sp — ES, | > x} < 267? Ekimi beman? 


Hint. First, use the inequality established in Problem 4.5.11 to derive the 
estimates 


P{S, - ES, > x} < eo Eeh(Sn—ESn) < ewhxt gh? Dea (bemar)? 


and then choose h accordingly. 


Problem 4.5.18. (“Large deviations.” Let (E,)n>1 be any sequence of independent 
standard Gaussian random variables (i.e., Law(&,) = (0, 1)) and let S, = & + 
... + &,. Prove that for any set A € A(R) 


n>oon 


1 n 
lim ze} EA 
n 


2 
-esit | exe 


(Given any real Borel function f(x) defined on (R,@(R)), by definition, 
ess inf { f(x) : x € A} is understood as sup{c € R: Af{x € A: f(x) < c} = 0}, 
where A is the Lebesgue measure—comp. with the definition of essential supremum 
in Remark 3 in [ P §2.10].) 

Hint. The following relation “nearly holds” for a “very large” n: 


= i n a dx. 
27 A 


Problem 4.5.19. Let £ = (&,...,§&,) be some Gaussian vector, such that E&; = 0, 
i = 1,...,n. Prove that 


Pl Sn eA 
n 


1 


1 
Hm, a BP r 


<i< 


Hint. Setting o = maxi<;<,(E&?)!/?, show that, for every r > 0, 


P| max é; > E max §& + or} < ye, 


l<i<n l<i<n 


and then check that 


r exp{—r?/(207)} 
P| max $1 2 r} Z ad = J2n(1+r/o;) ` 


for every 1 <i <n. 


Chapter 5 
Stationary (in Strict Sense) Random Sequences 
and Ergodic Theory 


5.1 Stationary (in Strict Sense) Random Sequences: 
Measure-Preserving Transformations 


Problem 5.1.1. Let T be any measure preserving transformation acting on the 
sample space 2 and let E = E(w), œ € Q, be any random variable, chosen so 
that the expected value Eé (œw) exists. Prove that E£ (w) = EE (To). 

Hint. If E = I4, A € F, then the identity EE (w) = E&(T(w)) follows from 
the definition of a “measure-preserving transformation.” By linearity, this property 
extends for all random variables € of the form D Axl,» Ak E€ F. In addition, 
for £ > 0, one has to use the construction of the expected value as an “integral,” 
in conjunction with the monotone convergence theorem. For a general &, use the 
representation € = £t — £7. 


Problem 5.1.2. Prove that the transformation T, from [P §5.1, Examples | and 2] 
is measure-preserving.! 

Hint. (Example 2) The identity P(A) = P(T~'!(A)) is trivial for sets A of the 
form A = [a,b) C [0, 1). For the general case, consider the system 


M = {A € B&((0, 1]) : P(A) = P(T|(A))} 
and, by using “the suitable sets method,” prove that.W@ = 4([0, 1)). 


Problem 5.1.3. Let 2 = [0, 1), let F = A([0, 1)), and let P be any probability 
measure on (2, F), chosen so that the associated distribution function on [0, 1) is 
continuous. Prove that the transformations Tx = Ax,0 <A < 1, and Tx = x’ are 
not measure-preserving. 

Hint. Due to the continuity assumption, it is possible to find some points 


a,b € (0, 1), such that 


‘It is assumed throughout the entire chapter that the probability space (2, F, P) is complete. 
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1 2 
P([0,a)) ==, P((0,b)) = >. 
3 3 
By using this property one can easily show that the transformations Tx = Ax, 


2 


0 <A <1,and Tx = x’ are not measure-preserving. 


Problem 5.1.4. Let 2 denote the space of all real sequences of the form 
w = (...,@-1,@0,@1,..-), 
let ¥ denote the o-algebra generated by all cylinder sets 


1w: (@g,.--,@k4+n—1) E By}, 


for all possible choices of n = 1,2,...,k = 0, +1, 2,..., and B, € Z(R"). 
Given some probability measure P on (2, F), prove that the double sided 
transformation 7’, defined by 


T(...,@-1,@0, @,.-.) = (...,@0, @1, @2,---), 
is a measure-preserving if and only if 
Pw: (@o0,..-,@n—1) E Bn} = Pl: (Ok, .. . , @k+n—1) E€ Bn} 


foralln = 1,2,..., all k = 0, +1, +2, ..., and all B, € A(R"). 


Problem 5.1.5. Let £o, &1,... be a stationary sequence of random elements with 
values in the Borel space S (see [P §2.7, Definition 9]). Prove that one can 
construct (perhaps on some enlargement of the underlying probability space) 
random elements €_;, &2,..., with values in S, so that the double-sided sequence 
..., 6-1, Eo, &,... is stationary. 


Problem 5.1.6. Let (2,.4#,P) be any probability space, let T be any measurable 
transformation of 2, and let £ be any z-system of subsets of (2 that generates F 
(i.e., n(£) = F). Prove that if the identity P(T! A) = P(A) holds for all A € £, 
then it must hold for all A € F. 


Problem 5.1.7. Let T be any measure-preserving transformation on (2, F , P) and 
let Y be any sub-o-algebra of F. Prove that the following relation must hold for 
every Ae F: 


P(A|¥)(Tw) = P(T A| TZ) (P-a.e.). (x) 
In particular, if 2 is taken to be the space R° of all real sequences of the form 
w = (w, ,...), if &(@) = wg, k > 0, is the associated family of coordinate maps 


on R, and if T denotes the shift-transformation on R°, given by T(@o, @1,...) = 
(@,@2,...) i.e., & (T@) = @k+1, k = 0), then (x) can be written as 


P(A|&)(To) = POT A| En41)(@)  (P-a.e.). 
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Problem 5.1.8. Let T be any measurable transformation acting on (2, F) and 
let Z stand for the collection of all probability measures, P, on (2, F) with the 
property that T is P-measure preserving. Prove that: 

(a) The set of measures # is convex. 

(b) The transformation T is an ergodic transformation of the measure P if and 
only if P is an extremal element of the set Z (i.e., P cannot be written as P = 
AP; +å2P3, for some A; > Oand A» > 0 with à; +A» = 1 and some P4, P2 € Z 
with P4 x P2). 


Problem 5.1.9. Let T be any measure preserving transformation acting on the 
probability space (2, .F,P) and let & = E(w), w € Q, be any random variable on 
that space. Prove that £ = £ (œ) is almost invariant under T (i.e., &(w) = &(Tw) (P- 
a. e.)) if and only if for any bounded and ¥ & A(R)-measurable functions G (œw, x) 
one can write 


EG(w, &(w)) = EG (To, &()). 


Hint. Consider first functions G (w, x) of the form G1 (@)G2(x). 


5.2 Ergodicity and Mixing 


Problem 5.2.1. Prove that the random variable € is invariant if and only if it is 
J -measurable. 


Problem 5.2.2. Prove that the set A is almost invariant if and only if P(T~!A \ 
A) = 0. Show also that if the random variable X is almost invariant (i.e., 
X(@) =_X(Tw) (P-a. e.)), then one can find an (everywhere) invariant random 
variable X = Xlo) (es X(w) = X (Tw) for all œ € §2) with the property 
P{X(w) = X(w)} = 1. 


Problem 5.2.3. Prove that the transformation T represents mixing if and only if for 
any two random variables € and 7, with E£? < oo and En? < oo, one has 


E&(T"w)n(w) > E&(w) En(w) as n —> oo. (*) 


Hint. If n = I4 and E = Tg, then (x) is precisely the mixing property. Each of 
the variables £ and 7 is in L? and can be approximated (in the metric of L?) with any 
precision by linear combinations of indicator functions. The required convergence 
of the expected values then follows easily from the mixing property. 


Problem 5.2.4. Give an example of a measure preserving ergodic transformation 
which is not mixing. 

Hint. Take 22 = {a,b}, set P({a}) = P({b}) = 1/2, and consider the 
transformation T, given by Ta = b and Tb = a. 
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Problem 5.2.5. Let T be a measure preserving transformation acting on (2, F , P) 
and let F = o(&), where & is some algebra of sub-sets of 2. Suppose that in 
[P §5.2, Definition 4] one assumes that 


lim P(4 N T™ B) = P(A) P(B) 


only for sets A and B that are chosen from .%. Prove that the above property will 
then be satisfied for all sets A and B that belong to F = o(&) (and, as a result, 
one can claim that T represents mixing). 

Show also that this statement remains valid if 2 is required to be a 2-system and 
F = n(A). 


Problem 5.2.6. Let (2, F) = (R®, A(R°)) and suppose that T is the usual 
shift-transformation on 92, given by T (x1, x2,...) = (x2, X3,...), for any œ = 
(x1, X2,...). Prove that any event from F = A(R) that is invariant under T 
must be a “tail” event; in other words, the entire o-algebra ae, , which comprises 
all T-invariant sets, is included in the “tail” o-algebra 2 = (FS; where 


FO = o (w : Xn, Xn41,-.-). Give examples of tail events which are not T-invariant. 


Problem 5.2.7. By providing appropriate examples of measure-preserving trans- 
formations T, acting on (2, F , P), prove that: (a) A € F does not entail TA € F; 
(b) one cannot conclude from A € ¥ and TA € ¥ that P(A) = P(TA). 


5.3 Ergodic Theorems 


Problem 5.3.1. Let E = (&, &,...) be some stationary Gaussian sequence with 
Eé, = 0 and with covariance function R(n) = E&j+,&. Prove that the condition 
R(n) — 0 is sufficient for claiming that the measure preserving transformation, 
associated with the sequence £, represents mixing (and is therefore ergodic). 

Hint. If A = {@ : (&1,&,...) € Ao}, B = {w: (&1,&,...) € Bo} and B, = 
{æ : (En, &+41,-.-) € Bo}, then one must show that 


P(AN B,) > P(A)P(B) as n> œ. 


The proof can then be established with the following line of reasoning: 
1. Given any £ > 0, find a number m € N = {1,2,...} and sets Ao E PR”) 
and Bo € A(R”), such that P(A A A) < £ and P(B A B) < «, where A = {% : 


(Eis. ., Em) € Ao} and B = {w: (&1,...,&m) € Bo}. _ 
2. Then choose some open sets Ag € A(R”) and By E A(R”), so that for the 
sets 


A= oo: E1,-.-+6m) € Ao} and B= fo: Eis- En) € Bol 


one has N E 
P(AA^AA)<£ and P(B A B)<e. 


This would then imply that P(A A A) < 2s and P(B A B) < 2e. 
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3. The sets B, = {@ : (E&n,.--,€n4+m—-1) E€ Bo} have the property P(B, A Bn) 
< 2e. 

4. Let P stand for the probability distribution of the vector (&,..., Em) and let 
Qn be the distribution of the vector (€|,..., Em, En, - -< , En+m-1). Then 


Rin) > 0 > O,S POP as n > œ. 


5. In conjunction with [P §3.1, Theorem 1], (iv) gives 


lim, P(A A Ba) > P(A)P(B), 


which, taking into account the relations (see above) P(AA A) < 2e and 
P(B, A Bn) < 2e, gives 


lim, P(A N By) > (P(A) — 2e)(P(B) — 28) — 
which, taking into account that € > 0 is arbitrarily chosen, gives 
lim, P(A N B,) > P(A)P(B). 
6. In analogous fashion one can prove that 
lim, P(A N B,) < P(A)P(B) 
(instead of the open sets Ao and Bo one must choose closed sets). 


Problem 5.3.2. Prove that for any sequence E = (&,&,...) that consists of 
independent and identically distributed random variables, one can claim that the 
associated measure preserving transformation represents mixing. 

Hint. (Observe that the ergodicity of the sequence & follows from the “zero-one 
law.”) The proof of the mixing-property can be established with the following line 
of reasoning: 

1. Define the sets 


= {w: (Éi, &,...) € Ao} and B= {w : (&, &,...) € Bo}, 


for some choice of Ap, Bo € A(R). Given any € > 0, it is possible to find an 
integer m € N = {1,2,...} and a set Ag € A(R”), so that P(A A A) < e for 


A = fw: (&,...,&n) € A. 
2. Define the sets 


= {øw : (&),€:41,...) E Bo}, n=, 


and observe that for any n > m 


P(A N B,) = P(A)P(B,,) = P(A)P(B). 
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3. Finally, prove that |P(A N Ba) — P(A)P(B,,)| < 2¢ and that 
P(AN B,) > P(A)P(B) as n > œ. 


Problem 5.3.3. Prove that the stationary sequence E = (&, &,...) is ergodic if 
and only if for every k = 1,2,... and every B € A(R*) one has 


LS Egi) > PiE f4) € B) as n—>oco (P-a.e.). 


i=l 


Hint. To prove the necessity part, let Q denote the distribution (on R°®°) of the 
sequence € = (&, &,...) and let T stand for the shift 


R® 3 x = (x1, X2,...) œ> T(x) = (x2, 43,...) € R°. 


In addition, given any k = 1,2, ... and any B € B(R*), define the function R™® > 
x = (X1,%X2,...) œ> f(x) = I((xı,..., Xk) E B) € R, and then apply to that 
function the Birkhoff—Khinchin ergodic theorem. 

In order to establish the sufficiency part, one has to prove that the transformation 
T, introduced above, is ergodic; in other words, the measure of any set from the 
associated collection Y (i.e., any invariant set) is either 0 or 1. 

The property 


DD B) > P{(&1,...,&)€ B} as n—>oo (P-a.e.) 
n 


i=l 


translates into the claim that for every set A e Z(R%®°), of the form 
{(x1,..., Xk) € B}, for some B € B(R*), one must have 


1 n : 

= > I4(T'x) > Q(A) asn>oo (Q-a.e.). 

n 

i=l 

In conjunction with the Birkhoff—Khinchin ergodic theorem, the last relation yields 
the identity Ea(74 | Z) = Eqla (Q-a.e.), which, in turn, implies that the sets A of 
the form {(x,,...,x,) € B}, for some choice of B € A(R“), do not depend on J - 
By using the “suitable sets” method, one can then conclude that the collection of 
sets 


M = {A € B(R?) : A is independent from 4} 


coincides with A(R). Finally, one can conclude that Z does not depend on Y 
and, therefore, the Q-measure of every invariant set is either 0 or 1. This proves the 
ergodicity of the transformation T, and, therefore, the ergodicity of the sequence £. 
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Problem 5.3.4. Suppose that T is some measure-preserving transformation on 
(2, F), under two different measures, P and P. Prove that if, in addition, T happens 
to be ergodic relative to both P and P, then either P = P or P L P. 


Problem 5.3.5. Let T be any measure preserving transformation on the space 
(2, F,P), let & be any algebra of sub-sets of 2, chosen so that o(@/) = F, 
and let 


n—-1 


1 : 
IP = A Y la(T*o), Aeg. 
k=0 


Prove that the transformation T is ergodic if and only if at least one of the following 
conditions holds: 


1. oe 2 P(A) for every A € %; 
2. lim, t $} P(A N T™% B) = P(A) P(B) for all A, B € %; 


cm bis L P(A) for every A € F. 


Problem 5.3.6. Suppose that T is some measure-preserving transformation on 
(2, F ,P). Prove that T is ergodic (for the measure P) if and only if there is no 
measure P Æ P, defined on (2, F), that has the property P < P, and is such that 
the transformation T is measure-preserving for P. 


Problem 5.3.7. (Bernoulli shifts.) Let S be any finite set (say, S = {1,...,N}), 
let 2 = S% be the space of all sequences of the form w = (w0, @1,...), @ E S, 
and let &, k > 0, be the canonical coordinate maps on S°, given by &(w) = ax, 
w € 2 = S”. Define the shift transformation T (wo, @,...) = (@1,@2,...). The 
same transformation can be defined in terms of the coordinate maps through the 
relations & (Tw) = @k+1, k > 0. Assume that to every i € {1,2,..., N} one can 
attach a non-negative number, p;, so that ey pi = 1 (i.e., the list (p1,..., pr) 
represents a probability distribution on S2). With the help of this distribution it is 
possible to define a measure P on (S%, A(S@)) (see [ P §2.3]), so that 


Plo: (@,...,@g) = (uy,...,UzK)} = Puy +++ Pug» 


In other words, the probability measure P can be defined in such a way that the 
random variables &)(w), €|(w),... become independent. It is common to refer to 
the shift transformation T as the Bernoulli shift or the Bernoulli transformation 
relative to the measure P. 

Prove that the Bernoulli transformation, as described above, has the mixing 


property. 


Problem 5.3.8. Let T be a some measure-preserving transformation on (2, F, P). 
Setting T” F = {T "A: A € F}, we say that the o-algebra 


Cc 
Foo = ( \ T” F 
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is trivial (or P-trivial), if the P-measure of every set from F—oo is either 0 or 1. 
If the transformation T is such that the associated o-algebra -F_oo is trivial, then 
we say that T is “Kolmogorov transformation.” Prove that every Kolmogorov 
transformation is ergodic and, furthermore, has the mixing property. 


Problem 5.3.9. Let 1 < p < œ, let T be any measure-preserving transforma- 
tion acting on (2,.%,P), and let &(w) be any random variable from the space 
LP (2, F,P). 

Prove the von Neumann ergodic theorem for L? (Q, F , P): one can construct a 
random variable, n(w), on (2, ¥, P), for which 


n—-1 P 


1 k 
El- J &(T¥a) — no) 


k=0 


—>0 an->ow. 


Problem 5.3.10. The Borel normal numbers theorem claims that (see [P §4.3, 
Example 2]) the proportion of zeroes, or of ones, in the binary expansion of 
w € [0, 1) converges almost surely, relative to the Lebesgue measure on [0, 1), to 
1/2. Prove this result by introducing the transformation T: [0, 1) — [0, 1), given by 


T(@) =2@ (mod 1), 


and by using the ergodic theorem—[ P §5.3, Theorem 1]. 


Problem 5.3.11. Let everything be as in Problem 5.3.10 and let œ e [0,1). 
Consider the transformation T: [0, 1) > [0, 1), given by 


0,  ife=0, 
LE fa, ifo £0, 


where {x} denotes the fractional part of the number x. 
The so called Gauss measure on the interval [0, 1) is defined as 


P(A) = 1 dx 
~~ fa J 1 ex’ 


A € &({0, 1). 


Prove that the transformation T preserves the Gauss measure P. 


Problem 5.3.12. By providing appropriate examples, prove that the Poincaré 
“reversibility” theorem (see [ P §5.1, 3 ]) may not hold for measurable spaces with 
infinite measures. 


Chapter 6 
Stationary (in Broad Sense) Random 
Sequences: L?-theory 


6.1 Spectral Representation of Covariance Functions 


Problem 6.1.1. By using [ P §6.1, (11)], prove the relation [ P §6.1, (12)]. 
Hint. The required statement can be established by using appropriate values for 
t; and a;. For instance, with m = 2, ti = 0 and t = n, it is easy to prove that 


(lai? + |a2|?)R(O) + aja2R(—n) + Tia R(n) > 0. 


Setting aq, = a2 = l anda, = 1, a = i above, and taking into account the 
property R(0) € R, one finds that R(n) + R(—n) € R and i(R(n)— R(—n) € R, 
and, therefore, R(—n) = R(n). 


Problem 6.1.2. Prove that if all zeroes of the polynomial Q(z), defined in [ P §6.1, 
(27)], happen to be outside of the unit disk, then the auto-regression equation 
[P §6.1, (24)] admits unique stationary solution, which can be written in the form 
of one-sided moving average. 


Problem 6.1.3. In the context of [ P §6.1], prove that the spectral functions for the 
sequences (22) and (24) have densities given by, respectively, (23) and (29). 

Hint. The formula in (23) may be established as follows: prove first that R(n) = 
Y? o dn+k@e and after that verify the relation R(n) = fZ, e!*" f(A) dd, where 
F(A) is given by (23). (It is useful to keep in mind that i. et^” dÀ = 276no, where 
no is the usual Kronecker symbol.) 


Problem 6.1.4. Prove that if )~*2°. |R(n)|?. < oo, then the spectral function 


n=—oo 


F(A) has density f(A), given by 


#0) = ae RO. 


where the series converges in the complex space L? = L?([—x, 2), A([—x, 7)), A), 
À being the usual Lebesgue measure. 
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Hint. Use the fact that i Tz 
in the space L?([—z, 1), A([—m, )), A). 


ge n=0,+1,+2,... is an orthonormal system 


Problem 6.1.5. Let (&,)n>0 be any stationary Gauss-Markov sequence with van- 
ishing mean. Prove that the associated covariance function, R(n), admits the 
representation 


R(n) = 071", 
for some 0 < A < 1. 
Problem 6.1.6. Let N = (N;);>0 be a Poisson process (see [P §7.10]) of param- 
eter 4 > 0. Define the (continuous-time) process & = & x (—1)%', where € 


is some random variable, which is independent from N, and is chosen so that 
Pxé = 1} = P{E = —1} = 4. Prove that E, = 0 and Egé, = eI, s,t > 0. 


Problem 6.1.7. Consider the sequence (&,),>o defined as 


N 
En = Do ax cos(ben — nx), 


k=1 


where ax, b, > 0, fork = 1,...,N, are given constants, and 7),...,y are 
independent random variables that are uniformly distributed in (0, 277). Prove that 
(&,)n>o is a stationary sequence. 


Problem 6.1.8. Let & = cosng,n > 1, for some random variable ø, that is 
uniformly distributed on [—z, 2]. Prove that the sequence (&,),>1 is stationary in 
broad sense, but is not stationary in strict sense. 


Problem 6.1.9. Consider the one-sided moving average model of order p (MA(p)): 
En = A0En + 41En-1 +... + ApEn—p> 


where n = 0,+1,... and € = (€) is a white noise sequence (see [P $6.1, 
Example 3]). Compute the dispersion Dé, and the covariance Cov (én, En+x). 


Problem 6.1.10. Consider the auto-regression model of order 1 (AR(1)) 
En = Qo +81 + 0&n, n= 1 


(comp. with formula [P §6.1, (25)]) with white noise € = (€,) and suppose that 
jæi] < 1. Prove that if E]&)| < oo, then 


ay(1 — a? a 
Eé, = af E& + of ai) > £ as n > ©; 
l—&ı 1— gı 


if, furthermore, D&) < oo, then 


o7(1 — a?) o? 
7 


Dé, = a?” Déo + 1 7 = 
=0 l-ai 


as n —> œO, 
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and 
2 ok 


7 
l-ai 


COV(Én ; En+k ) = 


Problem 6.1.11. In the setting of the previous problem, suppose that & is normally 


seta : 2 ; 
distributed with law M (e E =). Prove that the Gaussian sequence E = 
TRI 


(&,)n>o is both strictly and broadly stationary, with 
2 


(04 Oo 
; Dé, Sa A and cov(é,,, En tk) = 
l-a, l-a; 


oat 


Eé, = : 
$ 1-a? 


6.2 Orthogonal Stochastic Measures and Stochastic Integrals 


Problem 6.2.1. Prove the equivalence of conditions [ P §6.2, (5) and (6)]. 
Hint. To prove the implication (5) = (6), take A, | Ø, An € é, Dy = E\An, 
Do = @; then E = ype (De \ D-1) and [P §6.2, (5)] implies that Z(A,) = 


Z(E) — Z(Dy) > 0. 


Problem 6.2.2. Consider the function f ¢€ L?. By using the results from 
[P Chap.2] (specifically, [P §2.4, Theorem 1], the Corollary to [P §2.6, 
Theorem 3], and Problem 2.3.8), prove that there is a sequence, (fn)n>1, that 
consists of functions of the form specified in [P §6.2, (10)], and is such that 
| f — fal > Oasn > co. 

Hint. The proof may be established with the following argument. Given any 


€ > 0, one can construct the simple function g(A) = )°?_, fila, (A), where 
B; € & and fk € C, in such a way that |f — g||z2 < ¢/2. Then construct the 
sets Ak € é, so that the quantities m(A; A By), k = 1,..., p, are as small as 


needed. Finally, the function h(A) = S°?_, fila, (A) has the form specified in 
[P §6.2, (10)] and, furthermore, can be chosen so that || f — h||;2 < €. 


Problem 6.2.3. Assuming that Z(A) is some orthogonal stochastic measure, with 
structural function m(A), verify the following relations: 


E|Z(A,) — Z(A) |? = m(A, AA), 
Z(A; \ 42) = Z(A1)— Z(A1 N 42) (P-a.e.), 
Z(A; AAs) = Z(A\) + Z(A2) —2Z(A,N 42) (P-a.e.). 
Problem 6.2.4. Let € = (én), with E&, = 0, be any stationary sequence with 


correlation function R(n), and with spectral measure F (dA). Setting S, = & + 
... + En, prove that the dispersion DS, can be written in the form: 


z f sin 24 ? 
DS, = X(n- |k)R(k) or Ds, = | Gea F(dA). 


\k|<n =e 2 


2 
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Problem 6.2.5. Suppose that f(A) is a spectral density (i.e., for some spectral 
measure F one can write F(dA) = f(A) dA), which is continuous at A = 0. 
By using the second formula for the dispersion DS,,, established in the previous 
problem, prove that 


DS, = 22 f(0)-n + o(n). 


x 2 
(The kernel (Coes) is known as the Fejér’s kernel—see [P §6.4, 2].) 


6.3 Spectral Representations of Stationary (in Broad Sense) 
Sequences 


Problem 6.3.1. In the notation adopted in the proof of [ P §6.3, Theorem 1], prove 
that L3(F) = L?(F). 

Hint. According to Problem 6.2.2, every function f(A) €e L?(F) can be 
approximated arbitrarily closely in the norm of L?(F) with functions of the form 
g(a) = P_, fel, (A), where Be € &, A being the algebra comprised of all 
finite unions of intervals of the form [a, b), ~m < a < b < x. Consequently, it is 
enough to prove only that every function Tja b) (À) can be approximated with linear 
combinations of functions of the form e, (A) = eian sn = 0 El, £2,.... However, 
a function of the form Tub) (à) can be approximated with continuous functions, 
which, in turn, can be approximated with linear combinations of functions of the 
form e,(A),n = 0,+1,+42,... (the Weierstrass—Stone theorem). 


Problem 6.3.2. Let € = (&,) be any stationary sequence, such that, for some fixed 
N, one can claim that &n+y (@) = &(@), œw € Q, foralln € Z = {0,+1,+2,...}. 
Prove that the spectral representation of any such sequence comes down to the 
representation [ P §6.1, (13)]. 

Hint. Since R(N) = R(0), one can claim that the spectral measure F is piece- 
wise constant on [—z, 7) and has jumps at the points 


2ak 
Ar = Sp + mp e, k=1,...,N, 
where the integers px are chosen so that A; € [—z,2). As a result, the spectral 
representation of € must have of the form: 


N 
£, = i je" ZAN = Ye Za). 


k=1 


Problem 6.3.3. Let E = (&,) be any stationary sequence, chosen so that E£, = 0 
and 
, None 


l Ikl —a 
AE ERED => L rofi- K] ecw, 


k=0 1=0 |k|<N-1 


6.3 Spectral Representations of Stationary (in Broad Sense) Sequences 271 


for some constants C > 0 and æ > 0. By using the Borel—Cantelli lemma, prove 
that 
N 


1 
TAS —>0 as N—oœ (P-a.e.). 
k=0 
Problem 6.3.4. Suppose that the spectral density f(A) of the sequence € = (Em) 
is rational, in that 
l | Pn-1 (e~)| 


FOO = FE TOD 


where P,—1(z) = do +a1z +++- + an—1z"7! and Q, (Z) = 1 + biz +--+ + bnz” are 

given polynomials. In addition, suppose that Q,, has no roots on the unit circle. 
Prove that one can construct a white noise sequence € = (€m), m E€ Z, in 

such a way that the sequence (&,,) is a component of the n-dimensional sequence 
| n) (.e., El = Em), which is determined by the relations 


m? =*= 73m 


i i+] š 
m+1 = En + Biem+i, l = l,...,n— 1l, 
a=] 


n = i+! 
m+ — > bn-j m + Bn€m+1; 
j=0 


where 6; = do and f; = aj-1 — sas Bkbi-k, i > 1. 


Problem 6.3.5. One says that the stationary (in strict sense) sequence € = (&,) 
satisfies the strong mixing condition if 


æ, (E) = sup |P(AB) — P(A) P(B)| > 0 as n> o, 
AEF? 55 (6), BEF PO (E) 
where F” o(E) = o(...&-1, £o) and F(E) = o(én, En+1,.--). (Comp. with 
Problem 2.8.7.) 
Prove that if X and Y are two bounded (|X| < Cı and |Y| < C2) random 
variables, that are measurable, respectively, for F” (E) and F(E), then 


|JEXY — EX EY | < 4C1 C20, (£). 


Problem 6.3.6. Let E€ = (&n)—co<m<oo be any stationary Gaussian sequence, 
and let 


Ph (§) = SUPy y EXY, 


the supremum being taken over all random variables X and Y, with E|X|? = 
E|Y |? = 1, chosen from the closed linear manifolds L” (E) and L° (£), that are 
generated, respectively, by the families (&)m<o and (&n)m>n- 

Prove the Kolmogorov—Rozanov inequality: 


An (E) < py (&) < 20a, (E). 


(Comp. with the inequalities in Problem 2.8.7.) 
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Problem 6.3.7. Suppose that £ = (&,) is some stationary Gaussian sequence, that 
has a continuous spectral density, f(A), which is uniformly bounded from below by 
some positive constant, i.e., f(A) > C > 0,A € [—x, x]. By using the inequalities 
established in the previous problem, prove that the sequence € must have the strong 
mixing property. 


Problem 6.3.8. By considering sequences £ = (&,,), of the form 
En = Acos(An + 0), 


for an appropriate choice of the constant A # 0 and the independent random 
variables À and 6, prove that a stationary in broad sense sequence may have periodic 
sample paths and non-periodic covariance function. 


6.4 Statistical Estimates of Covariance Functions 
and Spectral Densities 


Problem 6.4.1. Consider the estimation scheme [P §6.4, (15)] and suppose that 
En ~ MN (0, 1). Prove that for every fixed n one must have 


(N — \n|)DRy (n; £) > 2r (1 +e") f (AdÀ as N >o. 


Hint. By using the assumption that £„ is Gaussian for every fixed n > 0, argue 
that 


(N — n)DRy (n; £) = 2x [ i [1 + ce? Ot Gy_,(A—v) fv) f(A) dv da, 


where ®y_, (A) is the associated Fejér kernel. The required result then follows from 
the above relation. 


Problem 6.4.2. Prove formula [ P §6.4, (16)] and its generalization: 
2f7(0), A=v=0,+n, 


lim cov( AAE), fv(v:8) =) PA), A=v #04, 
0, AF vy. 


Problem 6.4.3. Consider the first-order autoregressive model AR(1) 
En = 08-1 + 0&n, n>1, Eo = 0, 


in which € = (€) is a Gaussian white noise sequence (comp. with [P §6.1, (25)] 
and with the model discussed in Problem 6.1.10). Suppose that in this model o > 
0 is a known parameter, while © € R is some unknown parameter, that must be 
estimated from the observations &1, &,.... 


6.4 Statistical Estimates of Covariance Functions and Spectral Densities 273 


Let 6, = arg max pọ (x1, ..., Xn) be the maximum likelihood estimate of the 
parameter 0, obtained from the joint probability density of &,..., En, namely 


1 1 n r 
o(X1,...,X,) = —==— exp { —— Xk — Xk- , 
a = CPimaye I a 2 v 


Prove that 

a pat Xk- Xk 

On = 7 7 
kat Xk- 


Problem 6.4.4. Consider the Fisher Information 


Í 7 0° In po (Ei, -> En) 


I, (0) = E 302 


for the AR(1) model from Problem 6.4.3, Eg stands for the averaging operation, 
under the distribution Pg, of the sequence &, &,.... 

Prove that 

(a) T,(@) = Eş ye E; 


(b) as n —> oo, one has 


ce (Bla, 
MORRES \6| = 1, 
a, (ole i 


=? 


Problem 6.4.5. In the context of the AR(1) model discussed in Problems 6.4.3 and 
6.4.4, prove that the maximum likelihood estimate, 0„, has the following asymptotic 
properties: 


P(x), = A, < 1, 


lim Pity In (0) ô, = 8) = x} = HP), o] =1, 
Ch(x), Wek 


where (x) is the distribution function of the standard normal law and H a (x) is 
the distribution function of the random variable 


B?-—1 
0 x ——_——_., 
2/2 f, B2ds 


274 6 Stationary (in Broad Sense) Random Sequences: L-theory 


where B = (B;)o<s<) is a standard Brownian motion (see [P §2.13]) and Ch(x) 
is the distribution function of the Cauchy distribution law with density mes (see 
[P §2.3, Table 3]). 


Problem 6.4.6. As a continuation of the previous problem, prove that 


P(x), OAL, 
HY (x), |) =1, 


>.: Eki ô, — 0) <x? = 


k=1 


lim Po 


where H, z (x) denotes the distribution function of the random variable 
2 
By -1 


24) fi B2ds 


Thus, if (6, — 0) is normalized not by the Fisher information, but by the random vari- 


0 x 


able ( cei & 2), then one would end-up with only two probability distributions 
instead of three. 


Problem 6.4.7. Prove that the maximum likelihood estimate, 6. from Problem 
6.4.3 is uniformly asymptotically consistent on average: 


Supo Eolĝ, —0|—>0 as n>. 


6.5 Wold Decomposition 


Problem 6.5.1. Prove that any stationary sequence with discrete spectrum (i.e., 
with spectral function F(A) that is piece-wise constant) must be singular. 
Hint. If &,,n € Z = {0, +1, +2,...}, is one such sequence, then one can write 


oo 
En — 5 zyel" 


k=—0o 


with zx = Z({Ax}), k € Z, being orthogonal random variables with Ez, = 0 
and E|z,|? = of. Consequently, the spectral function can be written in the form 
F(A) = Dap <ay Of» Where $ p-o 0% < œ. Thus, one must show that H(&) = 
S(E), where H(€) is the closed linear sub-space of H?, generated by the random 
variables € = (...,&:-1,&:,..-), and S(€) = NFL a Hn (E), where each H, (E) is 
generated by the family é” = (..., &)-1, En). 

In order to prove that H(€) = S(&), itis enough to prove that £, € S(€), for every 
n € Z. However, due to the stationarity, it is enough to prove only that & € S(&), 
i.e., for every integer N € Z and every ô > 0, one can find some n € Hy (€E) with 
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lo — nll g2 < ô. Thus, it would be enough to show that one can take n = &,, for 
some appropriate choice of n < N. For that purpose, given an arbitrary 6 > 0, one 
can choose M so that ikem oè < 6/2, then prove that 


2 
En — éoll #2 < got > or lee = 1], 
k| <M 


and, finally, prove that, for any N € Z and any e > 0, there is an integer n < N, 
with |e’**" — 1| < e, for |k| < M. 


Problem 6.5.2. Let o2 = Eļė, — &,|?, where £, = E(&, | Ho(&)). Prove that if 
o? = 0, for some fixed n > 1, then the sequence € must be singular. If, furthermore, 
o? — R(0) asn —> oo, then £ must be also regular. 


Problem 6.5.3. Prove that the (automatically stationary) sequence € = (&,,), of the 
form é, = e!", for some random variable y, which is uniformly distributed on 
[0,27], must be regular. Find the linear estimate, En, of the variable &, and prove 
that the non-linear estimate 


gives an error-free forecast for £, based on the “past history” €° = (...,& 1, £o), 
L.e., 


Elé, — é|? =0, n>1. 


Hint. To prove the regularity of the sequence € = (é„), convince yourself that 
En = &k IM 2m represents a white noise sequence, and, therefore the representation 
En = /2n€, is of the same form as in [P §6.5, (3)]. 


Problem 6.5.4. Prove that the decomposition [P §6.5, (1)] into a regular and a 
singular components is unique. 


6.6 Extrapolation, Interpolation and Filtartion 


Problem 6.6.1. Prove that the assertion of [ P §6.6, Theorem 1] remains valid even 
without the assumption that ®(z) has radius of convergence r > 1, while all zeroes 
of (z) are in the domain |z| > 1. 


Problem 6.6.2. Prove that, for a regular process, the function ®(z), which appears 
in [P §6.6, (4)], may be written in the form 


1 [0.0] 
500 + Saz] » kļ|<1, 


k=1 


D(z) = V2 exp 
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where 


Ck = =f e^ In f(A) dd. 
20 Jun 


Conclude from the above relation that the error in the one-step forecast o? = E\é t= 
£|? is given by the Szegd-Kolmogorov formula: 


o? = anes} J nso. 


Hint. The Szegö-Kolmogorov formula may be established with the following 
line of reasoning: 
(i) First, prove that 


o? = lé -Èl = i le? -PASA dà, («) 


where $(A) is given by [P §6.6, (7)]. In conjunction with the notation adopted 
in [P §6.6, Theorem 1], taking into account (x), and the fact that f(A) = 
+|(e*)|?, one can show that o? = |bo|?. 

(ii) From the first part of the problem, 


(z) = Yo bez" = V2 exp 
k=0 


1 [0,6] 
k 
ze t ) adl, 


k=1 


which shows that bọ = v 2x exp{ Sco}, and, consequently, that 


1 1 /* 
o? = 2r exp] Zeol =pl f in fO) dal. 
2 20 daz 


Problem 6.6.3. Prove [P §6.6, Theorem 2] without assuming that [ P §6.6, (22)] 
is in force. 


Problem 6.6.4. Suppose that the signal 0 and the noise 7 are uncorrelated and have 
spectral densities 


1 1 1 


1 
fold) and fi) = > a 


Qn |1+ be? 2 


By using [ P §6.6, Theorem 3], find the estimate, ntm of the variable 0,4, from 
the observations &,k < n, where & = 6, + ng. Solve the same problem for the 
spectral densities 


_! -id)2 A 
PA= zlte | and hA = zy 
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Problem 6.7.1. Prove that in the observation scheme [P §6.7, (1)] the vectors mn 
and 6, — mn are uncorrelated: 


E[m;(@, —mn)] = 0. 


Problem 6.7.2. Suppose that in the observation scheme [P §6.7, (1)-(2)] the 
variable yọ and all coefficients, except, perhaps, ao(n, €) and Ao(n, &), are chosen 
to be “event independent,” i.e., independent of €. Prove that the conditional 
covariance y, is also “event-independent,” in that y, = Ey,. 


Problem 6.7.3. Prove that the solution to the system [P §6.7, (22)] is given by 
formula [ P §6.7, (23)]. 


Problem 6.7.4. Let (0, £) = (On, En) be a Gaussian sequence, which is subject to 
the following special case of the observation scheme [P $6.7, (1)]: 


On41 =a0, + bei(n+1) and & 4) = AO, + Beo(n + 1). 


Prove that if A # 0, b # 0 and B F 0, then the limiting error of the filtration 
y = limy—oo Yn exists, and is given by the positive root of the equation 


21 _ 72 2p2 
y+ [Pe ©], 28 fh 


A? A? 
Hint. By using formula [ P §6.7, (8)], one can show that 


ac? 


g? + Yn 


2 pee) 
Ynt1 =b ea 


where c? = (4)?. In other words, yn41 = f (yn), with f(x) = b? + a?c? — aoe 


x > 0. Furthermore, it is easy to see that f(x) is non-decreasing and bounded. From 
this property one can conclude that lim y, ( = y) exists and satisfies the following 
equation 


y? + [c?(1 — a?) — b’]y — b’? = 0, 
which, due to the Viéte formula, can have only one positive root. 


Problem 6.7.5. (Interpolation; [80, 13.3].) Let (0,&) be a partially observable 
sequence, which is subject to the recursive relations [P §6.7, (1) and (2)]. Suppose 
that the conditional distribution of the vector 6,,,, namely 


Ta(m,m) = P (Om <a | FÈ), 


is normal. 
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(a) Prove that for any n > m the conditional distribution 


Talm, n) = P(O, <a| FÈ) 


is also normal, i.e., ma(m, n) ~ N (u(m,n), y(m,n)). 
(b) Find the interpolation estimate, u (m, n), of 0m from FÈ. Find also the matrix 
y(m,n). 
Problem 6.7.6. (Extrapolation; [80, 13.4].) Suppose that in the relations [ P §6.7, 
(1) and (2)] one has 
ag(n, E) = ao(n) + ax(n)Ẹn, a(n, Ẹ) = ai (n), 
Ao(n, E) = Ao(n) + A(n)En, A(n, E) = Ai (n). 


(a) Prove that, with the above choice, one can claim that the distribution 
Tap(m,n) = P(O, <a,& < b| FÈ), n > m, is normal. 


T = 


(b) Find the extrapolation estimates 
EO, | FE) and Elén| FE), n>m. 


Problem 6.7.7. (Optimal control, [80, 14.3].) Consider some “controlled” and 
partially observable system (0n, &))o<n<n, Where 


On41 = Un + On + be(n + 1), 
Ent = On + e(n + 1); 


The “control” un is FE -measurable and such that Eu? < œ, forall <n < N-1. 
The variables £; (n) and €2(n),n = 1,..., N, are chosen as in [ P §6.7, (1) and (2)] 
and & = 0,0 ~ N (m,y). 

We say that the “control” u* = (uj,...,uy_,) is optimal if V(u*) = sup V (u), 


where 
N-1 
Vu) = | De +u2)+ | 
n=0 
Prove that the optimal control exists and is given by 
ux = —[1 + Poa]? Pazime, n=0,...,N—-1, 


where 
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and the quantities (P,,)o<n<y are defined recursively through the relation 
Py =1+ Pati = Proll + Patil®, Py = 1, 
while (mž) is defined by the relation 
mya =U +A t+ ye) (Ent — my), OSn<N-1, 
with mă = m and with 
Ynt =y; +1- a++ yP, O<sn<N-l], 


where we suppose that yj = y. 


Problem 6.7.8. (Nonlinear filtering and the “change-point” detection problem— 
see [117].) Typically, in statistical control—and especially in quality control—one 
encounters quantities whose probabilistic nature changes abruptly at some random 
moment 6. This moment represents “the change-point,” say, in a particular pro- 
duction process. In what follows we will describe the Bayesian formulation of the 
problem of early detection of “the change-point,” and will address questions related 
to the construction of sufficient statistics for this quantity. 

Let (2, .#) be some measurable space, let {P7; 2 € [0, 1]} be some family of 
probability measures on (2, F), let 0 be some random variable on (2, F), which 
takes values in the space of integers N = {0,1,2,...}, and, finally, let X1, X2,... be 
some sequence of observable random variables, defined on (2, F). Next, suppose 
that the following relations are in force: 


(i) P? {6 = 0} = x, P*{0 = k} = (1 — r) px, where py > 0, X? Pe = 1. 
(ii) For every x € [0, 1] and every n > 1 one has 


P7 Xi SNe ky < Xn} = nmPl{X, E Kiers An <An? 
n=] 


+(1 =x) X pry P™{X: Sisi Xe £ de PH Xei Sse Xn < Xa) 
k=0 


+(1 = ©) (nti + Pnt2 + JP XG < Xis- -+3 Xn S Xn, Xe ER. 
i) PHX: < x1, Xn < xn} = [k= P {Xk < xk}, j = 0,1. 


The practical meaning of the relations (i)—(iii) can be summarized as follows. If 0 = 
0 or 6 = 1, then “the change-point” has taken place before the observation process 
has begun. In this case, the variables X1, X2,... are all associated with the already 
“changed” production process and are independent and identically distributed, with 
distribution function Fı (x) = P!{X; < x}. If @ > n, i.e., the “change-point” occurs 
after the n-th observation, then the random variables X,,..., X;, are associated with 
the “normal” production process and are independent and identically distributed, 
with distribution function Fo(x) = P°{X, < x}. If @ = k, for some 1 < k < 
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n, then X,,..., Xx—; are independent and identically distributed with distribution 
function Fo(x), while X;,..., Xn are also independent and identically distributed, 
but with distribution function F; (x). We suppose that Fo(x) Æ F(x). 

Let fo(x) and fı (x) stand for the densities of the distributions Fo(x) and F; (x), 
with respect to some distribution, say, (Fo(x) + F(x))/2, relative to which Fo(x) 
and F; (x)) are both absolutely continuous. 

Let t denote the moment at which the “change-point” is declared. We suppose 
that t is a Markov moment relative to (.F,)n>0, where Fo = {9,82} and F, = 
o(X\,..., Xn). Essentially, t represents a guess and the quality of this guess will be 
measured in terms of the quantities: P™{t < 0}, which is the probability for “false 
alarm,” and E” (t—0)* , which is the expected delay in detecting the “change-point,” 
when the “alarm” is real, in that t > 0. 

One would like to construct a moment t that minimizes simultaneously the 
probability for “false alarm” and the expected delay in detection. But since such 
a moment does not exist (except for some trivial situations), we introduce the 
“Bayesian” risk (below we suppose that c > 0 is some appropriately chosen 
constant) 

R” (t) = P”{r < 0} + cE” (t — 6)", 
and say that the moment t* is optimal, if, for any m € [0,1], one can claim 
that P7{tr* < co} = 1 and that R*%(t*) < R*(t), for every P”-finite Markov 
moment T. 

According to Problem 8.9.8, a moment t* with the above properties exists and, 
in the special case where py = (1 — p)*~!p,0 < p < 1,k > 1, can be expressed 
as: 

t* = inf{n > 0: 2, > A}, 
where the constant A, which, in general, may depend on c and p, is the “alarm- 


trigger” threshold, while z, is the posterior probability for the “change-point” to 
occur no later than the n-th observation: 


tm =P"(0<n|F%,), n20, m=z. 


(a) Prove that the posterior probabilities zz, > 0, are subject to the following 
recursive relations: 


Tn fi(Xn+1) F p(l ka TAn) fi (Xn41) 
Tn fi(Xn+1) + pa = Tn) fi (Xn41) + ad = z)(1 = An) fol kneel) 


(b) Prove that if gn = 2,/(1 — ,), then 


Tn+1 = 


Si(Xn41) 


Pn+1 = (P+ Gn) G=p)hGuay 
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(c) Setting y = (p), m = 0 and w = limpo Yn (p)/ p, prove that 


fi (Xx, n+ 1) 
n = el + Wn) ; Wo = 0. 
4 ii Jo(Xn+1) 
Remark. If we set 6, = I(0 < n), then x, = E” (n | Fn) is the mean-square 
optimal estimate of 6, from the observations X1, ..., Xn. From (a), (b) and (c) one 


can conclude that the statistics z,,, gy, and Yn, are governed by nonlinear recursive 
relations, which are said to define the nonlinear filter (for the problem of estimating 
the values (@,,),>0 from the observations X1,..., Xn). 


(d) Prove that each of the sequences (mn )n>0, (Yn)n>o and (W,)n>0 constitutes a 
Markov chain. 


Chapter 7 
Martingale Sequences 


7.1 The Notion of Martingale and Related Concepts 


Problem 7.1.1. Show the equivalence of conditions [ P §7.1, (2) and (3)]. 
Hint. The proof can be established by contradiction. 


Problem 7.1.2. Let o and t be two Markov times. Show that t +0, t Vo and t Ao 
are also Markov times, and, if o(w) < t(w) for all w € 2, then Fo C ¥,. Does 
this property still hold if ø < t only with Probability 1? 

Hint. If o(w) < t(@) for allw € Q, then, for every A € Fs, one has 


AnN{t=ns=AN{o <n}N{rt =n E F,, 
and therefore A € F. 


Problem 7.1.3. Prove that t and X, are both .Y,-measurable. 


Problem 7.1.4. Let Y = (Yn, Fn) be a martingale (submartingale) and let 
V = (V,,-¥Fn—1) be some predictable sequence, for which one can claim that all 
random variables (V Y),, n > 0, are integrable. Prove that V Y is a martingale 
(submartingale). 


Problem 7.1.5. Let 4% > % 2D... be any non-increasing sequence of o-algebras, 
and suppose that € is some integrable random variable. Setting X, = E(&|%,), 
prove that the sequence (X,,),>1, forms a reverse martingale, i.e., 


E(Xn | Xn41, Xn42,---) = Xn41 (P-a.e.), forevery n> 1. 


Problem 7.1.6. Let &,&,... be any sequence of independent random variables, 
chosen so that P{g; = 0} = P{é = 2} = 4, and let X, = []}_, &. Prove that it is 
not possible to find an integrable random variable £, and a non-decreasing family of 
o-algebras (.F,,), so that one can write: X, = E(E | F„) (P-a.e.), for every n > 1. 
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Conclude that there are martingales that cannot be expressed as (E(& | -Fn))n>1, for 
some appropriate choice of £ and (F;,)n>1 (comp. with [ P §1.11, Example 3]). 
Hint. The proof can be established by contradiction. 


Problem 7.1.7. (a) Let &, &,... be any sequence of independent random variables 
with E|é,| < oo and E&, = 0,n > 1. Prove that, for every fixed k > 1, the sequence 


x® = > Gersei n=k,k+1,... 


l<ig <e <i <n 


forms a martingale. 
(b) Let £1, &,... be any sequence of integrable random variables, for which 


E(En41 |é... En) = E, 


n>l. 


Prove that the sequence X, = 1E +--+ én) n > 1, forms a martingale. 


Problem 7.1.8. Give an example of a martingale X = (Xn, Fn)n>1, for which the 
family {X1, X2,...} is not uniformly integrable. 


Problem 7.1.9. Let X = (Xn„)n>0 be a Markov chain ([P §8.1]) with countable 
state-space E = {i, j,...} and with transition probabilities p;;. Let y = W(x), 
x € E, be any bounded function with the property that, for some A > 0, one has 


a pi Wj) <AW@), forany i € E. 


(A function y with the above properties is said to be A-excessive, or A-harmonic.) 
Prove that the sequence (A~" (Xn,))n>0 forms a supermartingale. 


Problem 7.1.10. Let t1, t2,... be any sequence of stopping times, chosen so that 
either Tn | T, or Tn + T, in point-wise sense. Prove that t must be a stopping time 
in either case. 


Problem 7.1.11. Prove that if o and t are stopping times, then 
Fort = Fo N Fır and Foy, = 0O (Fo U F). 


Problem 7.1.12. Let o be any (finite) stopping time, and let t1, T2,... be any 
sequence of stopping times. Prove that if t, | œo, then 


RE Ge 
Font, T Fo, 


and, if tn } o, then Fo =|; Fr,- 
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Problem 7.1.13. Let £1, &,... be any sequence of independent standard normal 
random variables (£, ~ (0, 1)) and let $, = & +...+&,, > 1. Prove that the 
sequence (X,,),>1, given by 


1 2 
Xy = ex n ; 
Jari ei. 


is a martingale relative to the filtration (FÈ )n>1, With FE = o (£1, ..., En). 


Problem 7.1.14. Let X = (Xn, -Fn)n>0 be any stochastic sequence, set AX, = 
Xn — Xn-1; n > 1, and let v(@; {n} x dx) = P(AX, € dx | Fa-1)(@) be any 
regular version of the respective conditional expectation. Given any u € R, set 


A(u)o =O and A(u)} = J. (e -—1)v(0;{k}x dx), n=l. 
l<k<n 


Prove that the process M(u) = (M, (u), Fa), n = 1, with 


M, (u) = e" — X em AAW, 
k=1 


is a martingale. 
Problem 7.1.15. With the notation adopted in the previous problem, given any u € 
R, set 
G(u)jo =0 and G(u) = I] few v(w; {k} x dx), n>1, 
l<k<n 


and suppose that G (u), > 0, > 1. Prove that the (complex-valued) sequence 


eltXn 
(So; B 


( ef "Xn ) 
n iuAX;, | T, ? 
k=1 E(e'" K | Fk-1) n>1 


i.e., the sequence 


is a martingale. 


Problem 7.1.16. Let X = (Xn, F,)n>0 be any stochastic sequence, chosen so that 
|AX,| < c (P-a.e.), for some constant c > 0 and for all n > 1, where AX, = 
Xn — X,—1. Consider the (real-valued) sequence Y = (Yn, Fn)n>1, given by 


exn 


h= 3 
n M, E(e4xi | Fi-1) 


and prove that Y = (Y,,, Fa)n>1 is a martingale (comp. with Problem 7.1.15.) 
Will this property hold without the requirement for the variables AX, to be 
uniformly bounded? 
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Problem 7.1.17. Let &,..., €, be independent and normally distributed (4 (0, 1)) 
random variables, and let Sọ = O and S = & +... +é, 1 < k < n. Let 
P(x) = P{& < x}, let Fy = o(&,...,&), 1 < k < n, and let Fo = {P,Q}. 


Prove that, for every a € R, the sequence X = (Xx, Fk)o<k<n, given by 


y -o( 4>) 
e AVa TE) 


is a martingale. 


Problem 7.1.18. Let &,..., En be independent and identically distributed random 
variables, whose distribution is symmetric. Set Sọ = 0 and S, = & +... + &, 
1 < k <n. Let F(x;k) = P{S} < x}. As a generalization of the result stated in 
the previous problem, prove that the sequence X = (Xx, Fk)o<k<n, given by 


Xk = F(a — Sk,n— k) and Fk =o0(&,...,&), 


is a martingale. (An application of this property can be found in Problem 7.2.12.) 


Problem 7.1.19. Let &,&,... be any sequence of independent and identically 
distributed random variables with (shared) distribution function F = F(x), x € R, 
and let 


1 n 
F(x: @) = — Eo) <x), xeER, n>1, 
k=1 


be the associated sequence of empirical distribution functions (see [P §3.13]). 
By using the result in Problem 7.1.5, prove that, for every fixed x € R, the 
sequence (Yn (x), G(X))n>1, given by Y, (x) = F, (x; @) a F(x), G(x) = 
o (Yna (x), Yn+1ı (x), ...), is a martingale. 


Problem 7.1.20. Suppose that X = (Xn, Fn)nz0 and Y = (Yn, Fn)n>0 are two 
submartingales. 

(a) Prove that X v Y = (Xn V Yn, Fn)n>o is also a submartingale. 

(b) Can one claim that the following two sequences are submartingales: 


X+Y= (Xn + Yn, Frn)n>05 XY = (XnYn, Fa)nz0? 


If yes, explain under what conditions, if not, explain why? 
(c) Answer the analogous questions in the case where X and Y are martingales 
and also in the case where X and Y are supermartingales. 


Problem 7.1.21. Let &,&,... be any infinite sequence of exchangeable random 
variables (i.e., random variables with the property that, for every n > 1, the 
probability distribution of the vector (&,...,&,) coincides with the probability 
distribution of the vector (€,,,..., Ex, ), for any permutation, (71, . . . , Zn ), of the set 
(1,...,”)—for an equivalent definition see Problem 2.5.4). Suppose that E£; < oo, 


Sn Sn 
Sn =& +... + Én and let G, =o(%, er es i 
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As a generalization of [ P §1.11, Example 4], prove that one has 


Sn S41 
E(Ž ig) =F] (P-a.e.), n=l, 


1.e., the sequence Sa Gr forms a reverse martingale. 
n 


n>1 


Problem 7.1.22. Prove that any reverse martingale is automatically uniformly 
integrable. 


Problem 7.1.23. The o-algebra .F,, associated with the Markov time zt, is defined 
as the collection of sets 


{Ae F:AN{t =n} E F, foralln > 0}. 


Why can’t one define this o-algebra as F, E olFai:n <T)? 


Problem 7.1.24. If X = (X, Fn)n>1 is a martingale, then, for every sub- 
sequence, (ng) © (n), one can claim that (Xn,, Fn )k>1 is also a martingale. By 
providing appropriate examples, prove that, in general, this property may not hold 
for local martingales. 


Problem 7.1.25. In martingale theory, a uniformly integrable supermartingale 
TT = (Mn, Fa)n>0, with the property M, (w) —> 0 asn — oo, for every w € 2 
(point-wise convergence to 0), is called potential. 

Suppose that M = (Mn, Fa)n>0 is a potential and let A_; = Fo. Prove that 
there is a unique predictable and non-decreasing sequence A = (An, Fn—1)n>0> 
starting from 0, i.e., with Ag = 0, for which one can write 


I, = E(Aco — An | Fn), 1 > 0. 


Problem 7.1.26. Let X = (Xn,-Fn)n>o be a supermartingale. Prove that the 
following conditions are equivalent: 


(i) There is a submartingale, Y = (Yn, Fn)n>0, for which one can claim that 
Xn > Y, (P-a.e.), for all n > 0; 
(11) There is a unique Riesz decomposition of the form: 


X,=M,+I, n=0, 


in which M = (M,, Fn)n>o is a martingale and M = (Mn, Fn)n>o is a potential. 


Problem 7.1.27. Let X = (Xn, Fn)n>0 be any submartingale. Prove that one can 
find a non-negative martingale, M = (Mn, Fn)n>0, with the following properties: 


X: <M,, n>0, and sup EX? = sup EM,. 
Hint. Use the fact that X+ = (X+ 


a > Fn)nz0 is also a submartingale and set 
Mp; = limm-+oo EX n | Fa). 
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Problem 7.1.28. Suppose that the probability space (2, #,P) is endowed with 
the filtration F = (Fa)n>0, let o and t be any two Markov times (for F) with the 
property o(w) < t(w) for every w € 2, and let A, = {w : o(w) < n < t(w)}, 
n > 1. Prove that A, € -¥,— for every n > 1. In other words, the sequence 
(Xn)n>1, given by 

1, ifo(@) <n <t(o), 


Xn = . 
0 otherwise, 


is predictable, in that X„ is F„—ı-measurable for every n > 1. 


Problem 7.1.29. (On [P §7.1, Theorem 2].) Let X = (Xn, ¥n)n>o0 be any 
submartingale with Doob-decomposition X, = m, + An, n > 0, where Ay = 0 
and therefore mo = Xo. Prove that if {Xo, X1,...} is a uniformly integrable family, 
then EAs < oo and the family {, m1,...} is also uniformly integrable. 


Problem 7.1.30. Suppose that M = (M,, Fa)nz0 is a square integrable martin- 
gale. Prove that 


supEM? <œ <> J E(Mp- Mp1} < oo. 
n k>1 


Problem 7.1.31. Lett = t(œw) be any Markov time for the filtration (Fn )n>0 and 
suppose that f = f(n) is a non-decreasing function of n € N = {0,1,2,...}, 
chosen so that f(n) > n. Prove that T(w) = f(t(q)) is also a Markov time. 


Problem 7.1.32. Consider the sequence X = (Xn, Fn) and suppose that this 
sequence is a martingale with respect to the probability measure P. Then suppose 
that Q is another probability measure that is equivalent to P (Q ~ P). Prove by way 
of example that the sequence X = (Xn, Fna) is not necessarily a martingale relative 
to the measure Q. 


Problem 7.1.33. According to [P §7.1, Example 5], if X = (Xn, Fn) is a 
submartingale and g = g(x) is some convex and non-decreasing function with 
the property E|g(X,,)| < œ, n > 0, then the sequence (g(X,,),-F,) is also a 
submartingale. Give an example of a submartingale (X,,) and a function g = g(x), 
which is convex but fails to be non-decreasing, for which (g(X,,),-F,) is not a 
submartingale. 


7.2 Invariance of the Martingale Property Under Random 
Time-Change 


Problem 7.2.1. Prove that [P §7.2, Theorem 1] remains valid in the case of 
submartingales, provided that condition [ P §7.2, (4)] is replaced with 


lim A, dP=0. 


n> J {n>n} 
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Hint. The proof is essentially the same as in [ P §7.2, Theorem 1]. One only has 
to notice that the relation X,, < X} implies the following chain of inequalities: 


/ X,dP>= ml) X,dP— kn P| 
BN{m>=n} 7 MOQ: BA{n>n} BA{n>m} 


>f X,d P— lim XtdP. 
BA{tn >n} 


= m 
m—>oo J BN{t2>m} 


Problem 7.2.2. Let X = (Xn, -Fn)n>0 be any square integrable martingale, with 
EX, = 0, let t be a stopping time, and suppose that 


lim xX aP =. 


n> J{t>n} 


Prove that 
EX? =E(X), (- eax] l 
j=0 


where AXo = Xo, AX; = Xj —Xj-1,J > 1. 
Hint. In order to prove the inequality 


EXPZEY (AX), 
j=0 
use [P §7.2, Theorem 1] and Fatou’s lemma (E limy X? y < limy ES2, y). To 
prove the inequality in the opposite direction, observe that 


TAN 
EX? > EX? y =E) (AX,)’, 
j=0 
and use Fatou’s lemma again. 


Problem 7.2.3. Prove that for every martingale, or, for every non-negative sub- 
martingale, X = (Xn, Fna)n2z0, and for every stopping time t, one has 


E|X,| < lim E|X,|. 
noo 
Hint. Use the fact that | X | is a submartingale and that, by [ P §7.2, Theorem 1], 


E|X-,n| < E|Xw|, for every N > 1. Consequently, limy E|Xzan| < limy E|Xy]|.- 
The proof can now be completed by using Fatou’s lemma. 


Problem 7.2.4. Let X = (Xn, Fn)n>0 be a supermartingale, and suppose that there 
is a random variable £, with E|E| < oo, for which one can write X, > E(E | Fa) 
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(P-a. e.), for every n > 0. Prove that if tı and t2 are two stopping times with P{t < 
T2} = 1, then 
X,, > E(X,,|-F%.,) (P-a.e.). 


Hint. By using the result from [P §7.2, Theorem 1], verify the relations 
E|X,,| < œ, E|X,,| < oo and 


lim |X,|dP =0. 


n J{t>n} 


Problem 7.2.5. Let &,&,... be any sequence of independent random variables 
with P{g; = 1} = P{é; = —1} = 1/2, and let a and b be any two positive numbers 
with b > a. Given any n > 1, set 


X, =a) 1& =+1)-b) 1& = -1) 
k=1 


k=1 


and let 
t =inf{fn > 1:X, <-r}, r>0. 


Prove that Ee** < oo, for A < ao, and that Ee** = oo, for A > a, where 


b i 2b 7 a i 2a 
a+b a+b atb a+b: 


Ap = 


Problem 7.2.6. Suppose that £1, &,... is some sequence of independent random 
variables with E&; = 0 and Dé; = ay, and let S, = & +- + &, and 


Fi = o{&,...,&:}, for n > 1. Prove the following generalization of Wald’s 


identities [P §7.2, (13) and (14)]: if ES E|E;| < ov, then ES, = 0, and, if 
E yj =1 E$; < 00, then 


ES sey PE) oe? 
j=l j=l 


Problem 7.2.7. Let X = (Xn, Fn)n>1 be a square integrable martingale and let t 
be any stopping time for (.¥,,). Prove that 


EX? <E) (AX,)’. 
n=1 


In addition, prove that if 


lim E(X71(t >n))<oo, or lim E(X, (í >n) =0, 


noo noo 


then B(AX,) = E$; x. 
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Problem 7.2.8. Let X = (Xn, Fn)n>1 be any submartingale and let t} < T2 <... 
be stopping times for (.¥,,), such that the expectations EX,,, are well defined and 
lim E(X} T(t >n))=0, m>1. 
noo 
Prove that the sequence (X,,,,-¥;,)m>1 is a submartingale. (As usual, we define 
Fy, =A E FAN {tm = j} € Fj, j = Vb) 
Problem 7.2.9. Let X = (Xn, Fn)n>0 be a non-negative supermartingale and let 


To < T1 <... be stopping times for (.¥,,). Show that the sequence (X;,,-Fn)n>0 is 
also a supermartingale. 


Problem 7.2.10. As an extension of the elementary theorem in renewal theory— 
see [ P §7.2, 4 ]-—prove that (under the assumption Do, < œo and with the notation 
a = (Eo;)~') one must have 


DN, 
= +> a*Do, as t > o. 

Furthermore, the central limit theorem holds: 
p Í N, —at 


l VeDor 


Problem 7.2.11. Let &,&,... be any sequence of independent and identically 
distributed random variables, let S, = € +... + En, n > 1, and let 


<x} > (x) as t —> œ. 


t =inf{n > 1: S, > 0} 


Prove that if EE; = 0 then Et = œ. 


(as usual, we set t = œo, if S, < 0 forall n > 1). 


Problem 7.2.12. By using the martingale property of the sequence X = 
(Xk, Fk)o<k<n from Problem 7.1.18, and also the property EXọ = EX., (see 
[P §7.2, Corollary 1]), where ta = min{0 < k <n: Sk > a}, a > 0 (with the 
understanding that t, = n + 1, if Sk < a for all 0 < k < n), prove the inequality 
(see [ P §4.4, Lemma 1]) 


P| max Sp > a} <2P{S, >a}. 


0<k<n 


Problem 7.2.13. As an extension of the statements in [ P §7.2, Theorems 1 and 2], 
prove the following result: Consider the martingale X = (X,,, Fn) and let t be any 
stopping time with P{t < co} = a, for which E| X, | < œœ and lim, E[|Xn|/(t > 
n)] = 0. Then: 


lim E[|X,|/(t > n)] = 0; 


E|X; — Xımn| 70 as n—> o; 


and EX, = EX. 
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Problem 7.2.14. (On [P §7.2, Theorems 1 and 2].) Suppose that tı and t2 are 
two finite stopping times with P{t; < tm} = 1, and let X = (X),)n>0 be some 
martingale (all defined on the same probability space). Prove that if 
E sup |X,| < oo, (*) 
NIT 

then E(X, | Fa) = Xz, (P-a. e.). 

Hint. Use the fact that condition (*) implies that the family of random variables 
{|Xpaol, |Xoail,...} is uniformly integrable. 


Problem 7.2.15. In the context of [ P §7.2, Example 1], consider the stopping time 
t defined in [ P §7.2, (16)], and prove that Et? < oo, for every p > 1. 


Problem 7.2.16. Give an example of a martingale X = (Xn, Fn)n>0, and a 
stopping time t, with the property that (see [ P §7.2, Theorem 1]) the condition 


lim |Xn|dP = 0 
n J{rt>n} 
holds, but the condition E| X;| < oo fails, i.e., E|X;| = œœ. 
Problem 7.2.17. Let M = (Mn, Fn)n>o0 be any martingale and, given any N > 1, 


set Ty = inf{m > 0: |Mm| > N}, with the understanding that inf @ = oo. Prove 
that the martingale M is uniformly integrable if and only if 


lim E|M |I (ty < œ) = 0. 


Problem 7.2.18. (On [P §7.2, Examples 1 and 2].) Let &,, &,... be any sequence 
of independent and symmetric Bernoulli random variables (P{&; = 1} = P{&; = 
—1} = 1/2, fori > 1). Consider the stopping time 
t =inf{n > 0: S, = 1}, 

where So = 0 and S, = & +... + En (as usual, we suppose that inf @ = oo). 

(a) Prove that, for every A € R, the sequence (X Pasi given by 
S eSn 
n (coshay’ 


forms a martingale. By using this property, prove that P{t < co} = 1, Et = oo 
and 
E(coshA)* =e™, for every à > 0 
(comp. with [P §7.2, (18)]). 
(b) With a = 1/ cosh å, the above formula implies that 


Ea’ = Ya" P{r =n}= ‘ji — vize] 


n>1 
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(see also Problem 8.8.19). By using the last relation, prove that 
P{t = 27 — 1} = (-1)""'CY, , 


where 
B X(X —1)...(X -n+ 1) 
E n! 


n 
C X 


(see Problem 1.2.22). 

(c) Let J = inf{S,, : n < t}. Prove that for every k > 0 one has 

P{I < =k} = k 
= pet 

(d) Let % = inf{fn > 0: S, = lor S, = —k}. Show that ty —> t (P-a.e.) 
and Sa —> S; ask — oo (P-a.e.), and yet ES,, 4 ES, (in fact, ES, = 0, while 
ES, = 1). Explain why the convergence of the expected values does not hold (E S4 
does not converge to ES, as k — oo), in spite of the fact that S, —> S+ (P-a.e.). 


Problem 7.2.19. The argument of [ P §7.2, Theorem 2] is based on the assumption 
that the expectation of the stopping time T is finite (i.e., Et < 00). Prove that if, for 
some 0 < £ < 1 and some integer N , one can write 


P(t <n+WN|¥F,)>e (P-ae.) forevery n>1, 


then one can claim Et < œœ. 
Hint. Show by induction that P{t > kN} < (l—e)*,k > 1. 


Problem 7.2.20. Let m(t) denote the renewal function, introduced in [ P §7.2, 4]. 
The elementary theorem of renewal theory says that m(t)/t > 1/jast — oo. The 
next two statements refine this claim further. 

(a) Suppose that the renewal process N = (N;);>0 lives on a lattice of size d, 
i.e., for some fixed d > 0 one can claim that the the distribution of the random 
variable o; is supported by the set {0, d, 2d, ...}. Then (Kolmogorov, 1936) 


> d 
XO PIT: = nd} > — as n —> œ. 
u 


(b) If there is no d > 0, for which one can claim that the renewal process N = 
(N,):>0 lives on a lattice of size d, then (Blackwell, 1948) 


z h 
XOP < Te <t+h}>— as t > œo, (x) 
k=1 H 
for every h > 0. (Note that the sum in (*) gives m(t + h) — m(t).) 
Argue that the above two statements are plausible, or, better yet, just prove them. 
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Hint. With regard to (a), one must become familiar with the proof [P §8.6, 


Theorem 2]. 


Problem 7.2.21. Let £, &,... be any sequence of independent Bernoulli random 
variables, with P{é; = 1} = p, P{&; = —1} = q, p +q = 1,i > 1. Given some 
integers x, a and b, witha < 0 < b, set S,(x) = x +& +... + &, and let 


Ta(x) = inf{n > 0: S,(x) < a}, 

r(x) = inf{n > 0: S,(x) > b}, 

1? (x) = inf{n > 0: S (x) < a or S,(x) > b}. 
Prove that: 


l, if p < dx , 
Pir (x) < œ} = ifp<qandx>a 
(q/p)*, ifp>qandx >a; 


1, if p> d b, 
P{1?(x) < œ} = H4 p zZ qandy < 
(p/q)?*, ifp <q andx < b; 


P{r°(x) < oo} = 1, a<x<b; 


and that fora <x <b 


Ebi) TI -aeee er | 
a) q-P 4-PL(q/p) —(q/p)" P#a 
E tł (x) = (b—a)(x—a), if p=q = 1/2. 


Problem 7.2.22. Let &,&,... be any sequence of independent and identically 
distributed random variables, with values in the set {—1, 0, 1, .. .}, and with expected 
value u < 0. Let So = 1,8, =1+&+...+&, > 1,andlet t = inffn > 1: 
S, = 0}. Prove that Et = me 


7.3 Fundamental Inequalities 


Problem 7.3.1. Let X = (Xn, ¥n)n>o be any non-negative submartingale and let 
V = (Va, Fn—1)n>0 be any predictable sequence (as usual, we set F- = Fo), 
with 0 < V,4, < V, < C (P-a.e.), where C is some constant. Prove the following 


generalization of the inequality in [ P §7.3, (1)]: for every fixed A > 0, one has 


AP{ max VX; >a} f ViXnd P < EV: AX; , 
maxo <k <n Vk Xk <À 


0<k<n 
k=0 


with the understanding that AXo = Xo. 
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Problem 7.3.2. Prove the following result, known as Krickeberg’s decomposition: 
every martingale X = (Xn, Fn)n>0, that has the property sup E|X;,| < oo, can be 
written as the difference between two non-negative martingales. 


Problem 7.3.3. Let &,&,... be any sequence of independent random variables, 
let S, = & +---+ &, and let Smn = pane Ej. Prove the following relation, 
which is known as the Ottaviani inequality, 

P£{|Sn| > t} 


P{ max S; > 21) < - 2 0), 
Be! i ~ MIN} <j <n P{|Sjn| < t} 


and conclude that (under the assumption Eé; = 0, fori > 1) one must have 


f P| max |S;| > 2r} dt < 2E|S,| +2f 
0 


l<j<n 2E|S;, 


P{|S,| >t} dt. (x) 
| 


Hint. To establish the Ottaviani inequality, let A = { maxi<k<n |S] = 2t}, and 
let 


Ak = {|S;| < 2t,i =1,...,k — 1; |Sk| > 2t}, for l<k <n. 


Then A = J% Ax, and one can show that for any £ > 0 


P(A) min P{lSjal <3] = (PAD ++ PCAn))[ min PES jal < 23] 
s<j<n l<j<n 
= P(A, N {lS a t}) + sere P(A, N {lS > t}) < P {|S > t}. 


In order to establish (*) (under the assumption Eg; = 0, fori > 1), one only has 
to show that 


[ P| Zn [Sj] > 2r}at 


2E|S,l oo PS t 
<f a+ f {Sa > t ae. 
0 2E|s,| | — MaXi<j<n P{|Sjn| > t} 


and that for t > 2E|S,,| 


1— max P{|S;,| >t} > 1— max P{|S;n > 2E|S,|} 
l<j<n 1<j<n 


24 E|Sjn| 4 1 1 
— max -= . 
= 1<j<n 2E|S,, | E 2 2 

Problem 7.3.4. Let &,&,... be any sequence of independent random variables, 


with Eé; = 0,7 > 1. By using (*) in Problem 7.3.3, prove that following stronger 
version of the inequality in [ P §7.3, (10)]: 


ES* < 8E|S,|. 
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Problem 7.3.5. Prove the formula in [ P §7.3, (16)]. 
Problem 7.3.6. Prove the inequality in [ P §7.3, (19)]. 


Problem 7.3.7. Consider the o-algebras Yo,..., Fn with Fo C Fi C- C Fy, 
and let the events A, € Fp, k = 1,...,n, be arbitrarily chosen. By using [P §7.3, 
(22)], prove Dvoretzky’s inequality: 


P UA <A+P XO P(A | Fk-1) >A?, forevery À > 0. 
k=1 k=1 
Hint. Define X; = I4,,k = 1,..., n, and notice that 
10 = as al = Tut 
If B, = $, P(Ax | Fe—1), then [P §7.3, (22)] implies that 


P{X* > 1} < E(B, Ac) + P{B, > €}, 


from where the required inequality easily follows. 


Problem 7.3.8. Let X = (X),)n>1 be any square integrable martingale, and let 
(by )n>1 be any non-decreasing sequence of positive real numbers. Prove Hájek- 
Rényi’s inequalitiy: 


x 1 GW E(4Xp}? 
a > al < — BARI AX, = Xk — Xk-1 , Xo = 0. 


P] max A S p2 : 


l<k<n 


Problem 7.3.9. Let X = (X),)n>1 be any submartingale and let g(x) be any 
increasing function, which is non-negative and convex. Prove that, for every 
positive h, and for every real x, one has 


Eg(hx, 
P| max X; > x < Bala) 
I<k<n g(hx) 
In particular, one has the following exponential analog of Doob’s inequality: 


P| max X; > x} < e™ Eel Xn, 
l<k<n 


(Comp. this result with the exponential analog of Kolmogorov’s inequality, estab- 
lished in Problem 4.2.23.) 


Problem 7.3.10. Let &,&,... be independent random variables, with Eé, = 0 
and E£? = 1, form > 1, and lett = inf{n > 1:>-7_,& > 0}, with the 
understanding that inf Ø = oo. Prove, that Er!/? < oo. 
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Problem 7.3.11. Let E = (&,),>1 be any martingale difference. Prove that, for 
every 1 < p < 2, one can find a constant C,, for which the following inequality is 
in force: 


E sup Dal < Cp > EI&;|?. 
1 j=l 


n>1! 7 
j= 


Problem 7.3.12. Let X = (X,,)n>, be any martingale, with EX, = 0 and EX? < 
oo = 1, for any n > 1. As a generalization of the inequality established in Prob- 
lem 4.2.5, prove that, for every fixed n > 1, one has 


P| X, > | < EX, fi >0 
max E = oe ss or every € . 
l<k<n he = ¢2 + EX? y B 


Problem 7.3.13. Let £, &,... be any sequence of independent random variables, 
with P{&, = 1} = p and P{é, = —1} = q, where p+ q = 1,0 < p < 1, and let 
So = 0, $y =é +... + én. 

Prove that the sequence ((q/p)®”)n>o is a martingale and, if p < q, then the 
following maximal inequality is in force: 


k 
Pans > el < (2) ; 
n>0 q 


(Note that the above inequality is trivial if p > q.) 
In addition, prove that when p < q one has 


E sup S, < P 
n>0 q=p 
In fact, the above relations are actually identities, which shows that, for p < 


q, the random variable sup„>ọSn has geometric distribution (see [P §2.3, 
Table 2]), i.e., 


k 
P} sup S, = k = (2) (1-2). k=0,1,2,.... 
n>0 q q 


Problem 7.3.14. Let M = (Mk, Fk)o<k<n be a martingale that starts from 0, 
i.e., Mọ = O, and is such that —ap < AM, < 1 — ag, fork = 1,...,n, 
where AM, = Mp — Mp- and a, € [0,1]. As a generalization of the result 
established in Problem 4.5.14, prove that, for every 0 < x < q, with q = 1 — p and 
P = +=) 4k, one has 

P{M, > nx} < eY) 


where y(x) = In le)” 7] 
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Hint. Use the reasoning mentioned in the hint to Problem 4.5.14, and take into 
account the fact that 


EeMn = Ele E(e"^Mn | Fn-1)] 
< Ele" (a = ane "an + anet =], 


Problem 7.3.15. Let M = (Mx, Fk)k>o be a martingale with Mo = 0, chosen so 
that for some non-negative constants a, and by one has 


—ak < AM, <b, k= 1, 


where AM; = Mg — M;,_1. 
(a) Prove that, for every x > 0 and every n > 1, one has 


2x? 
P{M,, > x} < exp) — =|} 
t } P| pa (Ak Ep 


Po 
P{M, < =x} < exp | = DA tae l 


which, obviously, implies that 
2x? 
P{|M,| > x} <2exp | — ah 
Vai Ge + by)? 
(Comp. with the respective inequalities in [ P §1.6] and [ P §4.5].) 
(b) Prove that if a, = a and by = b, forall k > 1 (and, therefore, —a < AM; < 
b, k > 1), then the following maximal inequalities is in force: for every p > 0 and 
every x > 0 one has 


8 
P{My ~ Bn = x for some n} < exp - ah (x) 
furthermore, for every 6 > 0 and every integer m > 1, one has 
J 2 
P{M,, > Bn for some n > m} < exp} — 2mp , 
(a + b)? 
A (x*) 
2 
P{M, < —ßn for some n > m} < exp | = oak 


(Comp. with the inequalities in [ P §4.5].) 
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Remark. The inequalities in (a) are known as Hoeffding—Azuma’s inequalities. 
The generalization given in (b) is due to S. M. Ross and can be found in the book 
[107]. 

Hint. (a) Given any c > 0, one can write 

P{M, > x} <e Ee, 
Setting V, = eMn we have V, = V,—1;e°4™", so that 
E(V, | M,—1) = V,-1E(e°4!" | M,,-1). 


Iterating over n and using the assumption —a, < AM, < bx, one can show that 


7 n bpe™ t“ + age ok = n c2 
P{M, >x 2 oe <e a ex \< a +b? 
{ } I TENA I PiS (ak + bx) 


Consequently, 
n b 2 
P{M,, > x} < oxp] -o4 p E, 
k=1 $ 


and, since c > 0 is arbitrary, one can claim that 


P{M, > x} < mi — 2 
{ > x) < minexp | cx+e Y § 


k=1 


(ax + bx)? 


|- sear 
= exp) — =: - 

k=1 (ak + de)? 
(b) To prove (*), introduce the variables 


V, = exp{c(M, —x-—Bn)}, n=O, 


and notice that, with c = 8B/(a + b)’, the sequence (V,,),>0 is a non-negative 
supermartingale. Consequently, for every finite Markov time t(K) (< K), one must 
have 

EV,K) < EV = e78xP/ (a+b) 


With t(K) = min{n : M, > x+ An n= K}, this yields P{Myx) > x + 
Bt(K) = P{VecK) = 1} < EVK) < EVo, and, as a result, 


t == 


P{M, > x + Bn for somen < K) sal <I. 
a 
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Taking K — oo gives the inequality in (*), from which (**) obtains with the 
following manipulation: 


P{M, => Bn for somen > m} < 


< P{M, > me + w. for some n} 


< SBD - expl 2” 
sarl (a + b)? exp] e 


Problem 7.3.16. Let M = (Mn, Fn)n>0 be any martingale and, given some À > 0, 
let t = inf{n > 0 :|M,| > A}, with the understanding inf @ = oo. Prove that 


P{t < o0} <A" ||M |h, 


where ||M ||; = sup, E|M,|. 
Problem 7.3.17. With the notation adopted in the previous problem, prove that 


XC EļMr — Mii PIE > k) < 2AM |li, 
k=0 


where M_, = 0. 


Problem 7.3.18. Let M = (Mn, Fn)nz0 be any martingale with My = O, 
and let [M] = ([M]n, Fn)n>1, stand for its quadratic variation, i.e., [M], = 
y= (AMz)*, where AM; = Mg — Mx-1. Prove that 


Esup|Mn|<oo — > E[M]}2 <o. (=) 


Remark. The well known Burkholder—Davis—Gundi inequalities 
A MIS l < IMS < BAM p21, 


in which M = sup, |M,| and A, and B, are universal constants (comp. with 
[P §7.3, (27), (30)]; see also [79]), can be viewed as an “L?-refinement” of the 
property (+). 


Problem 7.3.19. Let M = (Mk, Fk)k>1ı be any martingale. Prove the Burkholder’s 
inequality: for every r > 2 there is a universal constant B,, such that 


n r/2 
EIM, < B, {E| DE (4M)? A)| +E sup [AM;]"}, 


=I l<k<n 


where AM; = Mg — Mg—1, k > 1, with Mọ = 0. 
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Problem 7.3.20. (Moment inequalities I.) Let &, &,... be any sequence of inde- 
pendent and identically distributed random variables with E£; = 0 and E|&,|" < oo, 
for somer > 1, and let $, = &;+...+&,, > 1. Due to the second Marcinkiewicz- 
Zygmund inequality (see [ P §7.3, (26)]), 


n r/2 
E|Snl’ < Be( > g) ; 


i=l 


for some universal constant B,.. 
By using Minkowski’s inequality (see [ P §2.6]) with r > 2, and the c,-inequality 
from Problem 2.6.72 with r < 2, prove that 
: Elg]; l<r<2, 
Eee. 
ne le |", r> 2. 


In particular, with r > 2 the last relation gives the inequality 

En—'/218,|" < B,E|&|". 
In conjunction with the result from Problem 3.4.22, one must have lim, En~!/? DAN 
> E|Z|", where Z ~ “~ (0,07), o? = E&?. 


Problem 7.3.21. (Moments inequalities II.) Let &,&,... be any sequence of 

independent and identically distributed random variables, and let So = O and 

Sn = & +... +é n > 1. Let t be any Markov time, relative to the filtration 

CF wo, defined by FE = {@, 2} and Fo = 0(S,...,S,), > 1. Prove that: 
(a) IfO0 <r < 1 and E|&|" < oe, then 


E|S;|" < Eléi|"Er. 
(b) If 1 <r < 2 and E| |" < oo, E&, = 0, then 
E|S_|" < B-E|&|"Er. 
(c) Ifr > 2 and Elé|" < oo, E£; = 0, then 
E|S,|' < B,[(E&{)' Et"? + El&|"Ex] < 2B,E|& | Et"? , 
where B, is an universal constant, that depends only on r. 


Hint. In all cases one must prove the required inequalities first for the “cut-off” 
(finite) times t An, n > 1, and then pass to the limit as n —> oo. 


Problem 7.3.22. Let &,...,&, be independent random variables. Prove the 
Marcinkiewicz-Zygmund inequality: for every r > 0 and every n > 1, one can 
find a constant, B,, which is universal, in that it depends only on r, so that one can 
write (comp. with the second inequality in [ P §7.3, (26)]) 


302 7 Martingale Sequences 


n 2r 
ES é 


j=l 


n 

r=] 2 

< Bn") Elė. 
j=l 


Hint. It is enough to consider only the (much simpler) case where r > 1 is an 
integer. 


Problem 7.3.23. Let (&,)n>1 be any orthonormal sequence of random variables in 
L? (i.e., E&E; = 0, fori Æ j and Eé? = 1 for alli > 1). Prove Rademacher- 
Menshov’s maximal inequality: for any sequence of real numbers (c,),>1 and for 
any integer n > 1, one has 


k 2 n 
ffs 2 2 
E mes, (De) <in O 
j= j= 


Problem 7.3.24. Let (&,),>+1 be any orthonormal sequence of random variables in 
L?, and let (cn)n>1 be any sequence real numbers with 


CO 
> cz ln? k < o. 
k=1 


Prove that the series )°7°. | cx & converges with Probability 1. 
Hint. Use the result from the previous problem. 


Problem 7.3.25. (On the extremality of the class of Bernoulli random variables: 
Part I.) Let £, . . . , En be independent Bernoulli random variables with P{g; = 1} = 
P{é; = -1} = 1/2. 

(a) Prove that, with p = 2m and m > 1, the second Khinchin inequality in 
[P §7.3, (25)] can be written in the form: for every n > 1 and every family, 
X,,..., Xn, of independent standard normal (-/ (0, 1)) random variables, one has 


n n 
X arék X an Xx 
k=1 k=1 


(b) Let X, denote the class of independent and identically distributed symmetric 
random variables X,,..., X,, with DX; = 1,i = 1,...,n. Prove that, for every 
n > l and every m > 1, one has 


n 
X ark 
k=1 


2m 2m 


E <E 


2m 


E 
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Hint. (a) It is enough to prove that 


E 


) aE ) ( y |a1| a seola | gA 
kı ! 2, ! g 
k=1 ky +...+k,=m ( ) a ( kn)! 


Cis 


E 


n 2m 
(2m)! m! 
DA S 2a. ee el 
k=1 * ky. Ak, =m Bese sets 

ki>0 


i= 


and that 2’""k,!...k,! < (2k,)!...(2k,)!, if ki +... + kn = m and k; > 0. (Note 
that EWI = (2m — 1)!! = EX?” —see Problem 2.8.9.) 

(b) With m = 1, the required inequality is obvious. In the case m > 2, one must 
prove first that the function g(t) = E|x + /fé,|””" is convex in the domain ¢ > 0. 
Next, by using Jensen’s inequality for the associated conditional expectations, 
prove that, if the sequences (&,...,&,) and (X,...,X),) are independent, then 


the following inequality must be in force 


n 
X art 
k=1 


2m 2m 


E <E 


J a &e| Xal 
k=1 


Finally, prove that 


law 


(E1|Xi),.--5 | Xn) = (X1,..., Xn). 


Problem 7.3.26. (On the extremality of the class of Bernoulli random variables: 
Part II.) Let X,,..., Xn be independent random variables, such that P{0O < X; < 
1} = 1 and EX; = p;,i = 1,...,n. In addition, let &,...,&, be independent 
and identically distributed Bernoulli random variables with P{gé; = 1} = p and 
P{é; = 0} = 1 — p, where p = (pı +... + pn) /n. Prove the Bentkus inequality: 
for every n > 1 and every x = 0,1,2,..., one has 


P{Xi +...+X%, =x} <eP{& +...+& > x}, 


where e = 2.718.... 
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7.4 Convergence Theorems for Submartingales 
and Martingales 


Problem 7.4.1. Let {%,,n > 1} be some non-increasing family of o-algebras (i.e., 
GDG%D...), let Go = (|G, and let n be a some integrable random variable. 
Prove the following analog of [ P §7.4, Theorem 3]: 


E(n|G%.) > E(n|Go) as n—>oco (P-a.e. and in L!-sense). 


Hint. Let f, (a,b) denote the number downcrossings of the interval (a,b) for 
the sequence M = (Mx) 1<k<n, given by Mg = E(n|%). Show first that 
E|n| + 
Epao(a,b) < HTH < o 
—a 


and conclude that B.o(a,b) < oo (P-a. e.). The rest of the proof is similar to the 
proofs of [ P §7.4, Theorems 1 and 3]. 


Problem 7.4.2. Let &,&,... be any sequence independent and identically dis- 
tributed random variables with E|&;| < oo and E£; = m and let S, = E1 +---+&, 
n > 1. Prove that (see Problem 2.7.2) 


Sn 
E(& | Sn, Sn4i,---) = Eli | Sn) = (P-a.e.), m2, 


By using the result from Problem 7.4.1, prove the strong law of large numbers: as 
n — œ one has 


S 
= >m (P-a. e. and in L!-sense). 
n 


Hint. Given any B € o(S,, Sn+1,-..), show that Egé, = EIg&;, i < n, and 
conclude that 


E(S, | Shs Sn4is jan E(& | Sn, Sati, tee ); 
in particular, E(&; | Sn, Sn41,...) = Sn (P-a.e.). In order to prove that Sn >m 
(P-a.e. and in L!-sense), consider the o-algebra 2 (s) = NZ o(Sn, Sn+i,--+) 
and, using the result from Problem 7.4.1, conclude that Se —> E(& | X (s)) (P-a. e. 
and in L!-sense). Finally, use the fact that the events A € Z (s) obey the Hewitt- 
Savage 0-1 law ([P §2.1, Theorem 3]). 


Problem 7.4.3. Prove the following result, which combines H. Lebesgue’s dom- 
inated convergence theorem and P. Lévy’s theorem. Let (En)n>1 be any sequence 
of random variables, such that &, — & (P-a.e.) and |&,| < 7, for some random 
variable n with En < oo. Let (F m)m>1 be any non-decreasing family of o-algebras, 
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and let Foo = o (L) Fn). Then one has (P-a. e.) 


Hint. Use Lebesgue’s dominated convergence theorem ([ P §2.6, Theorem 3]) 


and P. Lévy’s theorem ([ P §7.4, Theorem 3]) to estimate, for large n and m, the 
terms in the right side of the representation: 


E(é, | Fn) = EG | Foo) 
= [E (fn | Fm) — E(E | Am)] + [EE | An) — EE | Foo)]. 


Problem 7.4.4. Prove formula [ P §7.4, (12)]. 

Hint. Notice first that the system {H,(x),..., Ha (x)} is a basis in the space 
of functions that are measurable for F, = o(M,...,H,). As Fn has finitely 
many elements, every function that is measurable for F, is automatically simple 
(see [P §2.4, Lemma 3]). As a result, formula [P §7.4, (12)] must hold for some 
constants a; .. . , an. The fact that a, = (f, Hp) follows from the orthonormality of 
the basis {H)(x),..., Ha (x)}. 


Problem 7.4.5. Let 2 = [0,1), F = A&([0,1)), let P stand for the Lebesgue 


measure, and suppose that the function f = f(x) belongs to L!. Prove that 
fax) > f(x) (P-a.e.), for 
(k+1)2-" 
TAa = | fO)dy, k2"™<x<(k+1)2™. 
k2-" 


Hint. The main step in the proof is to show that (fi (x), Fn)n>1, with Fn = 
o((j2",G +12), 7 = 0,1,..., 2” — 1), forms a martingale. The result from 
[P §7.4, Theorem 1] will then conclude the proof. 


Problem 7.4.6. Let 2 = [0,1), Z = A&([0,1)), let P stand for the Lebesgue 
measure and suppose that the function f = f(x) belongs to L'. Assuming that the 
function f = f(x) is extened to the interval [0,2) by periodicity in the obvious 
way, and setting 
Qn 
hæ =J 2" fe +i”), 


i=1 
prove that 
falx) > f(x) (P-a.e.). 


Hint. Just as in the previous Problem, the key step is to show that the sequence 
(Sax), Fn)n>1, with analogously defined o-algebras (.¥,,),>1, forms a martingale. 
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Problem 7.4.7. Prove that [P §7.4, Theorem 1] remains valid for generalized 
submartingales X = (Xn, Fn), for which 


inf sup E(X} | Fn) < 00 (P-a.e.). 


M n>m 


Problem 7.4.8. Let (an)n>1 be some sequence of real numbers with the property: 
for some 6 > 0, one can claim that the limit lim, e’“”" exists for any t € (—ô, ô). 
Prove that lim, a, also exists and is finite. 

Hint. The existence of lim, e’’“ for every £ € (—6,4) is tantamount to the 
existence of lim, et^” for all £ € R. Thus, it suffices to prove that the function 
f(t) = lim, e'“ can be written in the form e'“°, for some finite constant c. This 
last property can be derived from the following properties of the function f(t): 


© [FO = 1,t € R; 

Gi) fla +b) = ft) fe), tit € R; 

(iii) the set of continuity points for the function f(t) is everywhere dense in R. 
Problem 7.4.9. Let F = F(x), x € R, be some distribution function, and let 
a € (0, 1) be chosen so that, for some @ € R, one can write F (6) = a. Define the 
sequence of random variables X1, X2,... according to the following rule (known 
as the Robbins—Monro procedure): 


Xn+1 = Xn = a, = a) , 
where Y1, Y2,... are random variables, defined in such a way that 


Pisy ei ee e 
1 — F(X,), ify=0, 
with the understanding that, for n = 1, the conditional probability in the left side is 
to be replaced by P(Y, = y). 
Prove the following result from stochastic approximation theory: in the Robbins— 
Monro procedure one has E| X„ — 0|? > 0 as n > oo. 


Problem 7.4.10. Let X = (Xn, Fn)n>1 be a submartingale, for which one can 
claim that 
E(X- I(t < œ)) Æ œ, 


for every stopping time t. Prove that the limit lim X,, exists with probability 1. 


Problem 7.4.11. Let X = (Xn, Fn)n>1 be a martingale and let 
Foo = o( a Fn). Prove that if the sequence (X,,)n>1 is uniformly integrable, 


then the limit Xə = lim YX, exists (P-a.e.), and the “closed” sequence 
n 


X = (Xn, Fn)i<n<oo is a martingale. 
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Problem 7.4.12. Suppose that X = (Xn, Fn)n>1 is a submartingale and let 
Foo = 0 (UL; Fn). Prove that if the sequence (X,*),>1 is uniformly integrable, 
then the limit Xæ = lim, X, exists (P-a.e.), and the “closed” sequence X = 
(Xn, Fn)i<n<oo is a submartingale. 


Problem 7.4.13. [P §7.4, Corollary 1 to Theorem 1] states that, for any non- 
negative supermartingale X, one can claim the limit Xæ = lim X, exists and is 
finite that with probability 1. Prove that the following properties are also in force: 

(a) E(X% | Fn) < Xn (P-a.e.),n > 1; 

(b) EX% < lim, EX,,; 

(c) E(X: | Fo) < Xrao for arbitrary stopping times t and o; 

(d) Eg(Xæ) = lim, Eg(X,,), for any continuous function g = g(x), x > 0, 
with £2 >O0ax-w; 

(e) if g(x) > g(0) = 0 for all x > 0, then 


X» =0 > limEg(X,) = 0; 
(f) for every given 0 < p < 1, one has 
P{X¥, =O} =1 6 limEX? =0. 


Problem 7.4.14. In P. Lévy’s convergence theorem ([P §7.4, Theorem 3]) it is 
assumed that E|&| < oo. Prove by way of example that the requirement for EE to 
exist (min(E€+,E&~) < 00) alone, in other words, without insisting that EE+ + 
EE- < oo, cannot guarantee the convergence E(é | Fa) > E(E | F) (P-a. e.). 


Problem 7.4.15. If X = (Xn, Fn)n>1 is a martingale with sup, E|X„| < oo, then, 
according to [P §7.4, Theorem 1], lim X,, must exist with Probability 1. Give an 
example of a martingale X, for which sup, E|X;,| = oo and lim X, does not exist 
with Probability 1. 


Problem 7.4.16. Give an example of a martingale, (X,,),>0, for which one has 
Xn > —œ asn — œ with Probability 1. 


Problem 7.4.17. According to [P §7.4, Theorem 2], given any uniformly in- 
tegrable submartingale (supermartingale) X = (Xn, Fn)n>1, one can find a 
“terminal” random variable Xə, such that X, —> Xə (P-a.e.). Give an example 
of a submartingale (supermartingale) for which the “terminal” variable X., with 
Xn —> Xoo (P-a. e.), exists, but the sequence (X,,),>1 is not uniformly integrable. 


Problem 7.4.18. Prove that any martingale, X = (X,,),>0, that has the property 


sup E(|X,,| Int |X,,|) < 00, 


must be a Lévy martingale. 
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Problem 7.4.19. Give an example of a non-negative martingale, 
X= (Xn, Fa)nz1 ; 


such that EX, = 1 foralln > 1, X,(@) —> Oas n —> œ for any w, and yet 
E sup„ Xn = 00. 


Problem 7.4.20. Assuming that X = (Xn, Fn)n>1 is a uniformly integrable 
submartingale, prove that, for any Markov time t, one has 


E(Xoo|4%r) =X, (P-a.e.), 


where Xo. stands for lim X,, which, according to Problem 7.4.12, exists with 
Probability 1. 


Problem 7.4.21. (On [P §7.4, Theorem 1].) Give an example of a supermartingale, 
X = (Xn, Fn)n>1, which satisfies the condition sup, E|X,| < oo, and, therefore, 
lim X, ( = Xo0) exists with Probability 1, and yet X, A Xoo in L", 


Problem 7.4.22. Argue that, given any square integrable martingale, M = 
(Mn, Fn)n>1, the condition 


5 E(M; — My-1)? < 00, 


k>1 
or, equivalently, E(M) x. < oo, where (M)o = lim,(M),, guarantees the 


L! 
convergence M, —> Mə (P-a.e.), and also the convergence M,,—> Moo, for some 
random variable Mæ, with EM2, < oo. 


Problem 7.4.23. Let X = (Xn, -F,)n>o0 be any submartingale. By the very defini- 
tion of submartingale, one must have E|X,„| < 00, for every n > 0. Sometimes this 
condition is relaxed, by requiring only that EX7 < oo, forn > 0. Which of the 
properties of the general class of submartingales, listed in [ P §7.4, 2-4], remain 
valid under this weaker notion of submartingale? 


Problem 7.4.24. Suppose that X = (Xn, Fn)n>0 is a supermartingale, i.e., X, is 
F,-measurable, E|X,| < oo and E(X;41|-Fn) < Xn, for n > 0. According to 
[P §7.4, Theorem 1], if sup,, E|X,,| < oo, then one can claim that with Probability 1 
the limit lim, X, = Xə exists and E| Xəæ| < co. 

Notice, however, that the condition E(X;,41 | Fn) < Xn is meaningful even with- 
out the requirement E|X,,41| < oo, as the conditional expectation E(X,41 | Fn) 
would be well defined if, for example, X,4; > 0, although in this case E(X,,+41 | Fn) 
may take the value +00 on some non-negligible set. 

In lieu with the last observation, we say that X = (Xn, -F;,)n>0 is a non-negative 
supermartingale sequence, if, for every n > 0, one can claim that X, is F,- 
measurable, P{X, > 0} = 1 and 


E(Xn41 | Fn) £ Xn (P-a. e.). 
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Prove that for any non-negative supermartingale sequence, 
X= (Xn, Fn )n>0 , 
the limit lim, Xn ( = Xoo) exists with Probability 1 and, furthermore, if P{Xo < 
oo} = 1, then Pl Xo < cof = 1. 
Hint. The proof is analogous to the proof of [P §7.4, Theorem 1] and hinges 


on the estimate (37) from [ P §7.3, Theorem 5] for the number of up-crossings of a 
given interval. 


Problem 7.4.25. (Continuation of Problem 2.2.15.) As was shown in Prob- 
lem 2.2.15, the following relation between o-algebras does not hold in general: 


NE.) =0(9.()&). 


Show, however, that the last relation is guaranteed by the following condition: 


the o-algebras ¥ and 6) are conditionally independent, relative to the o-algebra én, for 
every n > 1, i.e., one has (P-a. e.) 


P(AN B | én) = P(A | En)P(B | En). 
for any A E€ Y and B E &. 


Hint. It is enough to show that, for every Y Vé)-measurable and bounded random 
variable X, one has (P-a. e.) 


E(x | NE V &)) = E(x |g V Na): 
Furthermore, it would be enough to consider only random variables X of the form 


X = XX, 


where the bounded variables X; and X> are such that X; is 6|-measurable and X2 
is 6>-measurable. Finally, use the L'-convergence established in Problem 7.4.1, in 
conjunction with the conditional independence established above. 


Problem 7.4.26. Let &, &,... be independent non-negative random variables, with 
E& < 1 and P{é; = 1} < 1. For M, = &...&,,n > 1, prove that M, > 0 as 
n — oo (P-a.e.). 

Hint. Use the fact that the sequence (M,,),>1 forms a non-negative supermartin- 
gale. 


Problem 7.4.27. Let (2, (.F;)i>0, P) be any filtered probability space with Ap = 
{@, 2}, and let £, &,... be any sequence of random variables, chosen so that each 
&; is #;-measurable. Assuming that sup; E|&;|* < 00, for some «œ € (1, 2], prove 
that 
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1 = CEE S 
G-E | Fi) => 0. 
isi 
(Comp. with the law of large numbers, established in [P §4.3] and the ergodic 
theorems of [ P §5.3].) 


7.5 On the Sets of Convergence of Submartingales 
and Martingales 


Problem 7.5.1. Prove that any submartingale, X = (Xn, Fn), that satisfies the 
condition E sup, |X„| < 00, must belong to the class CT. 


Problem 7.5.2. Prove that [ P §7.5, Theorems 1 and 2] remain valid for generalized 
submartingales. 


Problem 7.5.3. Prove that, for any generalized submartingale, X = (X,, Fna), up 
to a P-negligible set, one has the inclusion: 


{inf sup Ext | Fm) < oo} C {Xn converges }. 


n>m 


Problem 7.5.4. Prove that the corollary to [ P §7.5, Theorem 1] remains valid for 
generalized submartingales. 


Problem 7.5.5. Prove that any generalized submartingale from the class Ct is 
automatically a local submartingale. 


Problem 7.5.6. Consider the sequence a, > 0, > 1, and let b, = = ak. 
Prove that XL] %4% < oo. 


a 
n=l b2 

oo 
n=1 


Hint. Consider separately the cases: )~ an < œ and Dci an = O&O. 


Problem 7.5.7. Let &, &,&,... be any sequence of uniformly bounded ran- 
dom variables, i.e., |&,| < c, forn < 1. Prove that the series ee, En and 
Dene1 En | &1,---, &n—1) either simultaneously converge or simultaneously diverge 
(P-a. e.). 


Problem 7.5.8. Let X = (Xn)n>0 be any martingale, with the property AX, = 
Xn—Xn—-| < c (P-a.e.), for some constant c < co (AXo = Xo). Prove that the sets 
{Xn converges } and { sup, Xn < oo} can differ only with a P-negligible set. 


Problem 7.5.9. Let X = (Xn, -Fn)n>o be any martingale, with 
sup,,>9 E|Xn| < 00. 


Prove that (AG) < co (P-a.e.). (Comp. with Problem 7.3.18.) 
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Problem 7.5.10. Let X = (Xn, Fa)n>0 be any martingale, with 
E sup,,>, |AXn| < 00. 


Prove that, up to a P-negligible set, 
os 5 (AXn)? < oo} C {X,, converges }. 


In particular, if EX (4X) < oo, then one can claim that the sequence 
(X1)n>0 converges with Probability 1. 


Problem 7.5.11. Let X = (Xn, Fn)n>o be any martingale with 
SUP,» >o E|Xn| < œ, 
and let Y = (Yn, Fn—1)n>1 be any predictable sequence with 


SUP;,>1 BA < œ (P-a. e.). 


oo 
n=l 


Prove that the series }~ Y, AX, converges (P-a. e.). 


Problem 7.5.12. Consider the martingale X = (Xn, Fn)n>0, chosen so that 
sup, E(|AX;|Z(t < 00)) < on, the sup being taken over all finite stopping times Tt. 
Prove that, up to a P-negligible set, one has 


CLO < oo} C {Xn > o0}. 


Problem 7.5.13. Let M = (Mn, Fn) be any square-integrable martingale. Prove 
that, for almost every w from the set {{M }oo = 00}, one has 


7.6 Absolute Continuity and Singularity of Probability 
Distributions on Measurable Spaces with Filtrations 


Problem 7.6.1. Prove the inequality in [ P §7.6, (6)]. 
Problem 7.6.2. Let P, ~ Pa, forn > 1. Prove that: 


P~P <> Pizo < 00} = Plo > 0} = 1; 


PLP <> Piz = œ} = lor Plz = 0} = 1. 
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Problem 7.6.3. Let P, < Py, n = |, suppose that t is a stopping time (relative to 
the filtration (F,,)), and let P; = P | F, and P; = P | F, denote, respectively, the 
restrictions of the measures P and P > to the o-algebra Fr. Prove that P4 < P- if 
and only if {t = co} = {Zoo < 00} (P-a. e.). In particular, this result implies that, if 
Pir < oo} = 1, then Pe < Pr. 


Problem 7.6.4. Prove the “conversion formulas” [ P §7.6, (21) and (22)]. 
Hint. Show directly that, for every A € F,—-1, one has 
E[14E(9| Fn—v)en—1] = Ela nza]. 
As for the proof of the second formula, it is enough to notice that 


Piz = 0} = 0. 


Problem 7.6.5. Prove the estimates in [ P §7.6, (28), (29) and (32)]. 
Problem 7.6.6. Prove the relation [ P §7.6, (34)]. 


Problem 7.6.7. Suppose that the sequences 


E= (6...) and E= is b5,00); 


introduced in [ P §7.6, 2 ], consist of independent and identically distributed random 
variables. - 

(a) Prove that if P; < Pg, then P < P if and only if the measures Pz, and Pe, 
coincide. Furthermore, if Py, < Pz, and Pz, Æ Pg then P L P. 

(b) Prove that if Py ~ Pg, then the following dichotomy is in force: one has 
either P = P or P L P (comp. with the Kakutani Dichotomy Theorem—| P §6.7, 
Theorem 3]. 

Problem 7.6.8. Let P and P be any two probability measures on the filtered space 
(2, F, (Fa)nz1). Let P g P Gie., P, < Pa, for all n > 1, where P, = P| Fy 
and P, = P| Fn), and let zn = dPa form > 1. 


dP,’ 
Prove that if t is a Markov time, then, on the set {t < oo}, one has (P-a. e.) 


~ d 
P &« P, and — =z. 


~ loc 
Problem 7.6.9. Prove that P « P if and only if one can find an increasing 
sequence _of stopping times, (Tn)n>1, with P{lim t, = oo} = 1, and with the 
property P}, <P,,, forn > 1. 


~ loc = 
Problem 7.6.10. Let P < P and let z, = a, forn > 1. Prove that 
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P infz, > o} =i. 


= loc 
Problem 7.6.11. Let P < P, zn = dP for n > 1, and let Foo = o (lU Fn). Prove 
that the following conditions are equivalent: 


(i) Poo & Poo, where Poo = P| Foo and Poo = P | Foo; 
Gii) P{sup, Zn < co} = 1; 
(iii) the martingale (z,,-F,)n>1 is uniformly integrable. 


Problem 7.6.12. Let (2, .#,P) be any probability space and let Y be any separa- 
ble o-sub-algebra inside .¥ , which is generated by the sets {G,, n > 1}, all included 
in F. Let Y, = o(Gj,...,G,) and let 2, be the smallest partition of 2, which is 
generated by Y,. 

Let Q be any measure on (92, F) and set 


X,(@) = 3 a ) Ti) 


AED, 


(with the understanding that 0/0 = 0). 

Prove that: 

(a) The sequence (X,,,%,)n>1 forms a supermartingale (relative to the measure 
P). 

(b) IFQ < P, then the sequence (X,,,G.)n>1 must be a martingale. 


Problem 7.6.13. As a continuation of the pervious problem, prove that, if Q « 
P, then one can find a -measurable random variable, Xoo = Xoo(w), for which 


1 
Xa = Xoo, Xn = E(X | fn) (P-a. e.) and, for every A € Y, one can claim that 


Q(A) = I Xoo dP. 


This is nothing but a special version of the Radon-Nikodym theorem from 
[P §2.6], stated for separable o-sub-algebras Y C F 


Problem 7.6.14. (On the Kakutani dichotomy.) Let o,a2,... be any sequence 
of non-negative and independent random variables, with Ea; = 1, and let zn = 
TTi=1 > zo = 1. 

Prove that: 

(a) The sequence (z,,),>0 is a non-negative martingale. 

(b) The limit lim, Zn ( = Zoo) exists with probability 1. 

(c) The following conditions are equivalent: 


L! 
G) Ezæ = l; (ii) Zn—=Zo; 


Gii) the family (Zn)n is uniformly integrable; 
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(iv) SOU -Evan)<0o; v) [[Evæ >0. 


n=1 n=1 


7.7 On the Asymptotics of the Probability for a Random 
Walk to Exit on a Curvilinear Boundary 


Problem 7.7.1. Prove that the sequence defined in [P §7.7, (4)] is a martingale. 
Can one make this claim without the condition |@,| < c (P-a.e.), for n > 1? 


Problem 7.7.2. Prove the formula in [ P §7.7, (13)]. 
Hint. It is enough to write the expression Ez” in the form 


“n 


n 1 
Ez = I] e( row jo = 5 > 


k=2 


and use the fact that all &; are normally (.4 (0, 1)) distributed. 
Problem 7.7.3. Prove the formula in [ P §7.7, (17)]. 


Problem 7.7.4. Let &,&,... be any sequence of independent and identically 
distributed random variables, and let S, = & +...+ én, for n > 1. Given any 
constant c > 0, set 


T>0) =inf{fn > 1: S, >0} and tes.) = inf{fn > 1: S, >c}, 
with the understanding that inf @ = oo. Prove that: 


(a) P{t(>0) < Oo} =1 P{lim S, = oo} =l; 
(b) (Eto < 00) & (Ete) < œ for all c > 0). 


Problem 7.7.5. Assume the notation introduced in previous problem, and set 


Too = inffn > 1: S, >0}, t<o) = inffn > 1: S, <0}, 


and 
t0 = inf{n > 1: S, < O}. 
Prove that 
Et>o) = ae and Eto) = ee 
P{t(<0) = oo} P{t(<9) = oo} 


Problem 7.7.6. Let &,&,... be any sequence of independent and identically 


Sn 


P 
| = 0, where 


distributed random variables with E|&,| > 0, chosen so that 
Sn =E +... + En. 
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Prove that the Markov times 10) and t(<o), defined in the previous problem, are 
finite, and that lim S,, = oo and lim S, = —oo, both with Probability 1. 


Problem 7.7.7. Let everything be as in the previous problem. Prove that S, —> oo 
with Probability 1 if and only if one can find a stopping time t (relative to the 


filtration (Fi )n>1, with FE = 0(&,...,§,)), for which Et < co and ES, > 0. 


Problem 7.7.8. Let (2, F, (Fn)n>0, P) be some filtered probability space, and let 
h = (hy)n>1 be some sequence of the form 


hn = Un + Onén , n= 1, 
where un € Rando, > O are ¥,—)-measurable random variables and E = 
(§;,-Fn)n>o is some stochastic sequence of independent and normally distributed 
(YW (0, 1)) random variables. Prove that the sequence h = (hy, Fn)n>1 is condi- 
tionally Gaussian, i.e., 


Law(hn | Fas P) = N Gin, on) (P-a8). 


Setting 


prove that the following properties hold: 


(a) The sequence Z,, = (Zn)n>1 is a martingale relative to the measure P. 
(b) If 
ft 2 
Eexp | = 2 (=) ! <œ (Novikov’s condition) 
l2 k= VOR 
and 


Í CA Uk Ley 
Zoe = ex) -D ta- D (E) J. 


then Z, = (Z,),>1 is a uniformly integrable martingale, Z% = lim Z, with 
probability 1, and Z, = E(Zoo | Fn) (P-a.e.), for any n > 1. 


Problem 7.7.9. Adopting the notation introduced in the previous problem, let 7 = 
o(U ¥,,), and let P be the probability measure defined by 


P(d) = Zoo P(do). 
Prove that if EZ = 1, then one can claim that 


Law(h, | Fn—1;P) = M (0,02) (P-a. e.). 


316 7 Martingale Sequences 


If, furthermore, o? = o} (w) is independent from a, then 
Law(h, |P) = (0,02), 


and the random variables h1, A2, ... are independent, relative to the measure P. 


Problem 7.7.10. Let wx, Ok, & and hy, for k > 1, be as in Problem 7.7.8, let 
H, = hi +... + hn, n > 1, and let X, =e”. 
Prove that if 


z 
u > =0 (P-a.e.) for k > 1, 


then the sequence X = (Xn, Fy )n>1 is a martingale. 
Now suppose that, for some k > 1, the above condition fails, and set 


(hk. OK i uy Be. Oe 
Zoo =e} -Yo(H+ F)a-sy(A+F) If. 


k=1 k=1 


Assuming that EZ, = 1, define the measure 
P(d@) = Zoo P(do), 


and let F = o (U Fn). _ 
Prove that relative to the measure P the sequence (Xn, Fn)n>1, with X, = e 
is a martingale. 


Hn 
> 


7.8 The Central Limit Theorem for Sums of Dependent 
Random Variables 


Problem 7.8.1. Consider the random variables & = Nn + Ẹn, n > 1, and suppose 


d d d 
that n, —> n and ¢, — 0. Prove that &, > 7. 


Problem 7.8.2. Let (&,(€)), n > 1, be some family of random variables, which is 


P 
parameterized by £ > 0, and suppose that &,(¢) > 0 as n — oo, for every € > 0. 
By using, for example, the result in Problem 2.10.11, prove that one can construct 


P 
the sequence £, | 0 in such a way that &,(€,) — 0. 
Hint. Choose the sequence €, | 0 so that P{|&,(e)| => 2-"} <2" ,n > 1. 


Problem 7.8.3. Consider the complex-valued random variables (a), 1 < k < n, 
n > 1, chosen so that for some constant C > 0 and for some positive sequence 
(Gn )n>1, With a, | 0, one has for every n > 1: 


n 
X Jož] <C and jaj|<a,, forl<k<n, (P-a.e.). 
k=1 
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Prove that 


n 


ee t+arje% =1 (P-a.e.). 


Problem 7.8.4. Prove the statement formulated in Remark 2, following [P §7.8, 
Theorem 1]. 


Problem 7.8.5. Prove the statement formulated in the remark following the lemma 
in [P §7.8, 4]. 


Problem 7.8.6. Prove [P §7.8, Theorem 3]. 
Problem 7.8.7. Prove [P §7.8, Theorem 5]. 


Problem 7.8.8. Assuming that £ = (&,)—co<n<oo is some sequence of independent 
and identically distributed random variables, with E£, = 0 and Dé, < oo, consider 
the sequence 7 = (Nn)n>1, given by 


co 


Co 
Mn = 5 Cn—j&j, with 2 Ic) |? < œ, 


j=—00 j=—00 


and suppose that 
D? = E(m +... +)? > 00. 


Prove the following central limit theorem: 


LEM 1 X 2 
pe ca >f oP)? ap 
n T J—oo 


Problem 7.8.9. Let (2", F”, (Ff )o<k<n, P”), n = 1, be some sequence of filtered 
probability spaces and suppose that, given any n > 1, the random variables £” = 
(&!)1<k<n are chosen so that each &? is F} -measurable. 

Let u be any infinitely divisible distribution on (R, A(R)), with characteristics 
(b,c, F) (see Problem 3.6.17 and the continuous cutoff function h = h(x) in that 
problem). 

Consider the sequence of probability distributions associated with the random 
variables Z” = )~;_, £, n = 1, and prove that in order to guarantee the weak con- 
vergence of that sequence (of distributions) to some infinitely divisible distribution 
4, it is enought to require that the following conditions hold: 


P 
sup P” figg > e| Fs} 0, e>0, 


l<k<n 


S EMAC) | Fil b, 


l<k<n 
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D (EPE FA] - EDZAN) ST 


l<k<n 


> Ele) | Fal > FE), ge G1, 


l<k<n 


where © = c + f h?(x) F(dx), 6; = {g} stands for the class of functions of the 
form ga(x) = (a|x| — 1)* A 1 for various choices of the rational number a, and 


F(g) = f g(x) F(dx). 


Problem 7.8.10. Let £o, £1, &,... be some stationary in strict sense sequence with 
E& = 0. Let (comp. with Problem 6.3.5) 


a, = sup|P(AN B)—P(A)P(B)|, k= 1, 
where the supremum is taken over all sets 
AE Fo=o(&), BEF =ol&, &41,...). 


Prove that if the strong mixing coefficients, a,, k > 1, are such that, for some 
p > 2, one has 


p—2 
Da <oo and Eļéo|? <o, 
k>1 
then the joint distribution, P% >» of the variables Xj, ..., X71, given by 


1 [nt] 
X’ = — , 4> 0; 
t aH a 


converges weakly to the distribution, P,,_.+,, of the variables (Jc Bip neds Je By), 
where B = (B,);>0 is a Brownian motion process and the constant c is given by 


c = Eg +2) Ee. 


k>1 


7.9 Discrete Version of the Ito Formula 


Problem 7.9.1. Prove the formula in [ P §7.9, (15)]. 


Problem 7.9.2. Based on the central limit theorem for the random walk S$ = 
(S;,)n>0, establish the following formula 
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E|S,| ~ as n> oo. 
T 


(Comp. with the hint to Problem 1.9.3.) 


Remark. In formulas (17) and (18) in [P §7.9, Example 2] one can actually 
replace 27 in the denominator with 7/2. 


Problem 7.9.3. Prove the formula in [ P §7.9, (22)]. 


Problem 7.9.4. Formula [ P §7.9, (24)] remains valid for every function F € C?. 
Try to prove this claim. 


Problem 7.9.5. Generalize formula [P §7.9, (11)] for the case where the func- 
tion F(X;) is replaced by a non-homogeneous vector function of the form 
F(k, X}, ..., X8). 


Problem 7.9.6. Setting f(x) = F’(x), consider the following trivial identity, 
which may be viewed as a discrete version of the Itô formula: 


F(Xn) = F(Xo) + X fX) AX + DF (Xe) — F(Xe-1) — f(Xe-1) AX], 
k=1 k=1 


Outline the reasoning which, starting from the last relation, allows one to obtain the 
discrete version of the Itô formula (formula [P §7.9, (24)]), for twice continuously 
differentiable functions F = F(x). 


Problem 7.9.7. Generalize the identity in the previous problem for the case where 
the function F(X;,) is replaced by a non-homogeneous vector function of the form 
F(k, X},..., X®). 


Problem 7.9.8. (Discrete version of Tanaka formula; see Problem 1.9.3.) Consider 
some symmetric Bernoulli scheme (i.e., a sequence of independent and identically 
distributed random variables), &, &,..., with P{€, = +1} = Pf{é&, = —1} = 1/2, 
n > l, and let Sọ = 0 and S, = & + ... + én, forn > 1. Given any x € Z = 
{0, +1, +2,...}, let 


N, (x) = #Hk,0<k <n: Sy = x} 
be the number of the integers 0 < k < n, for which Sọ = x. 


Prove the following discrete analog of Tanaka formula: 


n 


|S, = x| = |x| + > sign(Sk-1 — x) A Sk + Nn- (x). 
k=1 


Remark. If B = (B;);>0 is a Brownian motion, then the renowned Tanaka 
formula gives 


t 
|B, —x| = |x| +f sign(B, — x) dB, + N; (x), 
0 
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where N(x) = (N;(x));50 is the local time of the Brownian motion B at level 
x € R. Recall that, originally, P. Lévy defined the local time N,(x) as (see, for 
example, [12] and [103]): 


1 t 
Ni) = lim f I(x— e< B, <xt+e)ds. 
a0 2e Jo 


7.10 The Probability for Ruin in Insurance. Martingale 
Approach 


Problem 7.10.1. Prove that, under assumption A in [ P §7.10, 2], the process N = 
(N,):>0 has independent increments. 


Problem 7.10.2. Prove that the process X = (X;);>0, defined [P §7.10, 1], also 
has independent increments. 


Problem 7.10.3. Consider the Cramér-Lundberg model, and formulate the analog 
of the Theorem in [P §7.10, 3], for the case where the variables o;, i = 1,2,..., 


are independent and distributed with geometric law, i.e., Pfo; = k} = q‘~'p, 
k>1. 
Problem 7.10.4. Let N = (N;);s0 be a Poisson process of parameter A— 


see [P §7.10, (3)]. Prove the following “Markov property:” for every choice of 
O=t) <t) <...<t, and0 < kı < k<... < kn, one has 


P(N, = kn | Na = kı, tae Ni = kn-1) = P(N,, = kn | Ni, —1 = ky-1). 


Problem 7.10.5. Let N = (N,);>0 be a standard (i.e., of parameter A = 1) Poisson 
process, and suppose that A(t) is some non-decreasing and continuous function, 
with 1(0) = 0. Then consider the process N oA = (Nj@))r>0. Describe the 
properties of this process (finite dimensional distributions, moments, etc.). 


Problem 7.10.6. Let (7\,...,7;,) denote the times of the first n jumps of a given 
Poisson process, let (X;,..., Xn) be independent and identically distributed random 
variables, which are uniformly distributed on the interval [0,¢], and, finally, let 
(X(1),..., Xm) denote the order statistics of the variables (X,..., X;,). Prove that 


Law(T),...,T, | N; = n) = Law(Xq),..., X), 


i.e., the conditional distribution of the vector (7|,..., Ta), given the event N; = n, 
coincides with distribution of the vector (X(1),..., Xm). 


Problem 7.10.7. Convince yourself that, if (V;);>o is a Poisson process, then for 
any s < t one can write 
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Ci" (s/ty"(— s/t)", msn, 


mon. 


P(N; = m|N; =n) = 


Problem 7.10.8. It is an elementary matter to check that, if X; and X% are 
two independent random variables that have Poisson distribution with parameters, 
respectively, A; and Az, then X; + Xə also has Poisson distribution (and with 
parameters A; + Az). Prove the converse statement (due to D. Raikov): if X, and 
Xə are any two independent and non-degenerate random variables, for which one 
can claim that X; + Xə is distributed with Poisson law, then X; and X2 also must 
be distributed with Poisson law. 


Problem 7.10.9. Suppose that N = (N;);>0 is a standard Poisson process, which 
is independent from the positive random variable 6, and then consider the “hybrid” 
process N = (N;);>0, given by N; = Nig. Prove the following properties: 


(a) Strong law of large numbers: 


2 —>0 ast—>co (P-a.e.) 


(comp. with Example 4 in [ P §4.3, 4]). 
(b) Central limit theorem: 

Í N, — 0t 

l vor 

(c) If D < œ, then 


P 


<x} > (x) ast >œ. 


N, — EN, 0 — E0 
> ; 
/DN,; v DO 


? 


Problem 7.10.10. Prove that, for a given u > 0, the “ruin function’ 
vu) = PÍ inf X, < o} (= P{T < œ}) 
t> 


may be written in the form 


Wu) = P sup Y, > uf. 
n>1 
where Y, = )0;_, (& — co;). 
In addition, prove that the estimate y(u) < e~", which, under appropri- 
ate assumptions, was derived in [P §7.10] by using “martingale” methods, can 
be established by using more elementary tools. Specifically, setting W,(u) = 


P{ maXi<k<n Yk = u}, n > 1, prove first that yı (u) < e7", and then prove by 


induction that y, (u) < e7 for any n > 1, so that y(u) = lim y, (u) < eT Ru 
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Problem 7.10.11. The time of ruin, T , was defined by the formula T = inf{t > 0: 
X; < 0}. Alternatively, the time of ruin may be defined as T= inf{t > 0: X; < O}. 
Explain how the results established in [ P §7.10] would change if the time T is to be 
replaced by the time T. 


Problem 7.10.12. As a generalization of the (homogeneous) Poisson process, that 
was introduced in [P §7.10, 2], consider the non-homogeneous Poisson process 
N = (N;)+>0, defined as: 


N=} I(T; <t), 


i>l 


where T; = o+.. .+0; and the random variables o; are independent and identically 
distributed with 


Pio; < t} = 1-expf = f aas). 
0 


The function A(t) above, which is known as the intensity function of the process N, 
is assumed to satisfy: A(t) > 0, i A(s) ds < co and h A(s) ds = ov. Prove that 


k 
k- i (Jo à(s) ds) 
P{N, < k} =P{T >t} = ye} -f is) ds| pn 
i=0 0 : 
Problem 7.10.13. Let N = (N;,);>0 be the non-homogeneous Poisson process 


defined in Problem 7.10.12 above, let (n)n>0 be a sequence of independent and 

identically distributed random variables, which are also independent from N, and, 

finally, let g = g(t, x) be some non-negative function on R x R. Prove the 
Campbell formula: 


oo T 
EÐ eni st) = f Elge AO ds. 


n=l 


Problem 7.10.14. Let N = (N,);>o be a homogeneous Poisson process, defined by 
No = O and N, = Da I(T, < t), fort > 0, the random variables on+1 = Ta+1 — Th 
(n > 1, To = 0) being independent and identically distributed, with law 


Ax 


Plon41 = x} =e ead 0. 


Setting U, = t — Ty, and V, = Ty,+1, prove that 
P{U, <u, V; = v} = [Tuz + Liner (1 nP e>) |a B e™). 


(In particular, for any fixed t > 0, the variables U; and V, are independent, and V; is 
exponentially distributed with parameter 1.) Find the probability P{Ty,41 — Ty, = 
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x}, and prove that P{Ty,41 — Ty, => x} # e** (= P{Ty41 — T, > x}). Prove 
that, as £ —> oo, the distribution of Ty,+1 — Ty, converges weakly to the distribution 
law of the sum of two independent exponentially distributed random variables of 
the same parameter À. 


7.11 On the Fundamental Theorem of Financial 
Mathematics: Martingale Characterization 
of the Absence of Arbitrage 


Problem 7.11.1. Prove that with N = 1 the no-arbitrage condition is equivalent to 
the inequality [P §7.11, (18)]. (It is assumed that P{AS; = 0} < 1.) 


Problem 7.11.2. Prove that in the proof of Lemma 1 in [P §7.11, 4] condition (19) 
makes case (2) impossible. 


Problem 7.11.3. Prove that the measure P from Example 1 in [P §7.11, 5] isa 
martingale measure and that this measure is unique in the class M(P). 


Problem 7.11.4. Investigate the uniqueness of the martingale measure constructed 
in Example 2 in [P §7.11, 5]. 


Problem 7.11.5. Prove that in the (B, S)-model the assumption |M(P)| = 1 implies 
that the variables Sn, 1 <n < N, are “conditionally bi-valued.” 


Problem 7.11.6. According to Remark 1, following [P §7.11, Theorem 1], the 
First Fundamental Theorem remains valid for any N < oo and any d < on. Prove 
by way of example that if d = oo, then it could happen that the market is free of 
arbitrage and yet no martingale measure exists. 


Problem 7.11.7. In addition to [P §7.11, Definition 1], we say that the (B, S)- 
market is free of arbitrage in weak sense, if, for every self-financing portfolio 
x = (,y), with XJ = 0 and X7 > 0 (P-a.e.), forn < N, one has X¥F¥ = 0 
(P-a. e.). We say that the (B, S)-market is arbitrage-free in strong sense if, for every 
self-financing portfolio x, with Xf = 0 and X = 0 (P-a.e.), one has Xz =0 
(P-a.e.), forO <n < N. 

Assuming that all assumptions in [ P §7.11, Theorem 1] are in force, prove that 
the following conditions are equivalent: 


(i) The (B, S)-market is free of arbitrage. 
(ii) The (B, S)-market is free of arbitrage in weak sense. 
(iii) The (B, S)-market is free of arbitrage in strong sense. 


Problem 7.11.8. Just as in [P §7.11, Theorem 1], consider the family of all 
martingale measures: 


M(P) = {P ~P :S/Bis a P-martingale} 
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and let 
Mioc(P) = P ~P:S/Bis a P-local martingale}, 


M, (P) = Pa P:Pe M(P) and To) < C(P) (P-a.e.) for some 
constant C (P)}. 


Prove that, in the setting of [P §7.11, Theorem 1], the following conditions are 
equivalent: 


(i) M(P) # Ø; (ii) Mioe(P) # Ø; (iii) My (P) # Ø. 


7.12 Hedging of Financial Contracts in Arbitrage-Free 
Markets 


Problem 7.12.1. Find the price, C(fy;P), of a standard call option with payoff 
fn = (Sw — K)", in the (B, S)-market described in Example 2 in [P §7.11, 5]. 


Problem 7.12.2. Prove the inequality in [ P §7.12, (10)] in the opposite direction. 
Problem 7.12.3. Prove formulas [ P §7.12, (12) and (13)]. 

Problem 7.12.4. Give a detailed derivation of formula [ P §7.12, (23)]. 

Problem 7.12.5. Prove formulas [ P §7.12, (25) and (28)]. 

Problem 7.12.6. Give a detailed derivation of formula [ P §7.12, (32)]. 


Problem 7.12.7. Consider the one-period version of the CRR-model formulated in 
(17) in [P §7.12, 7]: 


Bı = BA+r), Sı = So + p), 


where we suppose that p takes two values, a and b, chosen so that —1 <a <r < b. 

Now suppose that p is uniformly distributed in the interval [a, b] (with the same 
choice for a and b) and consider the period 1 payoff f (S1) = f(So(1+p)), for some 
convex-down and continuous payoff function f = f(x), x € [So 1+a), So (1+b)] 
(here we suppose that Sọ = const). Prove that the upper hedging price: 


C(f;P) = inf{x : dx, XZ = x and X7 > f(So(1 + p)) Vp € [a, b]}, 


coincides with the upper hedging price in [P §7.12, (19)], with N = 1 and 
with P{p = b} = p and P{p = a} = 1 — p,0 < p < 1, so that 


r-a f(Sol+b))  b-r f(So +a) 
b-a l+r b-a l+r ` 


(f: P) = 
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Problem 7.12.8. (The Black-Scholes formula.) As a generalization of the discrete- 
time (B, S)-market B = (B,)o<n<n and S = (S),)o<n<y—see [P §7.12, 2 ]— 
consider the continuous-time (B, S)-market model 


B; = Boe" and S, = Soet% OK<t <T, (x) 


in which u,o € R andr > O are exogenously specified constants and W = 
(W;)o<t<r is exogenously specified Brownian motion. Analogously to [P §7.12, 
(1)], for a given strike-price K > 0, consider an European-style call-option with 
termination payoff fr = (Sr — K)t = max[Sr — K, 0] and suppose that in (*) 
the constant jz is chosen to be u = r — La, Under these conditions, prove that the 
following properties: g 

(a) The process (Poster is a martingale. 

(b) The “fair” price of the call option, C( fr; P), defined as 


CCH P] = BEL, 
F 


can be computed according to the Black-Scholes formula: 


2 


mtr (r+) m2+rT(r-2) 


C(fr;P) = So ® — Ke" o 
(fr; P) 0 ae aur 
where (x) = Te Smo eo /dy, xeR. 


Hint. Prove that C( fr; P) = e~’E(a e?§"/2 — K)*, where a = Soe", b = 
oVT and £ € M (0, 1). By using direct calculation prove that 


a 142 a E 
E(a eÞ81/2 _ K)t = ao(=t72) = co(MtE) 


7.13 The Optimal Stopping Problem: Martingale Approach 


Problem 7.13.1. Prove that the random variable (w) = supyeg, Ea (Œ), intro- 
duced in the proof of the Lemma in [ P §7.13, 3], satisfies conditions (a) and (b) in 
the definition of essential supremum (see [P §7.13, 3). 

Hint. If æ £ Xlo, consider the expression E max(&(w), &(@)). 


Problem 7.13.2. Prove that the variable E(w) = tan È (œ), introduced at the end 
of the proof of the Lemma in [P §7.13, 3], satisfies conditions (a) and (b) in the 
definition of essential supremum (see [ P §7.13, 3]). 
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Problem 7.13.3. Let &,&,... be any sequence of independent and identically 
distributed random variables with E|&,| < oo. Consider the optimal stopping 
problem within the class MP = {t: 1 < T < oo}: 


V* = sup E(max &; — cr) : 


reM  \ TST 


and let t* = inf{n > 1:&, > A*}, where A* stands for the unique root of the 
equation E(é; — A*) = c, with the understanding that inf Ø = oo. Prove that if 
P{r* < co} = 1, then the time t* is optimal in the class of all finite stopping times 


t, for which E(maxi<. & — ct) exists. Show also that V* = A*. 


Problem 7.13.4. In addition to the notation introduced in [P §7.13, 1] and 
[P §7.13, 2], let 


MO = {tin < Tt < œ}, 


Ve = sup Ef, 
tEMO 

v = ess sup E( f; | Fn), 
renee 

Te = inf{k > niv = fi}, 


and assume that 
Esup f; < œœ. 


Prove that the following statements can be made for the limiting random variables 
p, = li N. 
Vn = UMN NV->00 Vy, : 

(a) For every t € St 


CO 
nm? 


one has 
Vn 2 EC fc | Fn). 
(b) If t° © M, then 
Vn = EC free | Fn), 
Vn =V (= ess sup E( f; | Fn)). 


TEMOSO 


Problem 7.13.5. Adopt the notation introduced in the previous problem and let 
T° € M. By using (a) and (b) in the previous problem, conclude that the time 


T,° is optimal, in the sense that 


ess sup E( fi | Fn) = E( froo | Fn) (P-a. e.) 


tTEMŞO 


7.13 The Optimal Stopping Problem: Martingale Approach 327 


and 


sup Ef, = E fio, 


TEMO 
ië; V” — E froo. 


Problem 7.13.6. Suppose that the family of random variables X = {&(w);a € 
A}, defined on some probability space (2, ¥, P), is chosen so that, for some fixed 
constant C , one has E|&| < C for alla € 2. In addition, suppose that the family X 
is “sufficiently rich,” in the sense that: if &, € X and &, € X, for some a, a2 € A, 
then 


E= Ea IA + Ew IZ E a 


for every A € F. (A family X with these properties is said to admit needle 
variations.) Setting 


Q(A) = supE&I4, AEF, 
aert 
prove that: 
(a) The set function Q = Q(-) is o-additive. 
(b)Q «P. 
(c) The Radon-Nikodym derivative aQ is given by 


d 
ie = sae Ex  (P-a.e.). 


(In particular, (c) above may be viewed as a proof of the fact that the essential 
supremum of a family of random variables that admits needle variations must be 
finite.) 

Prove that the statement (a), (b) and (c) above remain valid if the condition 
E|é,| < C, œ € A, is replaced with EET < 00, a € A. 


Problem 7.13.7. Let MO = {tr : n < t < oo}. Prove that if t1, t2 E€ MO and 
A € Fa, then the time t = t174 + t2/z belongs to MS. 


Problem 7.13.8. Let (2, F, (Fna)n20, P) be any filtered probability space, and let 
Jn be any ¥,,-measurable random variable with E f7 < oo, for n > 0. Prove that, 
for every fixed n > 0, the family of random variables {E( f; | An); T € MS } admits 
needle variations. 


Chapter 8 
Sequences of Random Variables that Form 
Markov Chains 


8.1 Definitions and Basic Properties 


Problem 8.1.1. Prove the statements formulated as Problems 1a, 1b and 1c in the 
proof of [ P §8.1, Theorem 1]. 


Problem 8.1.2. Prove that in [P §8.1, Theorem 2] the function œ —> P,+1)(B — 
Xn(w)) is F,-measurable. 


Problem 8.1.3. By using [P §2.2, Lemma 3], prove the relations [P §8.1, 
(11) and (12)]. 


Problem 8.1.4. Prove the relations [ P §8.1, (20) and (27)]. 
Problem 8.1.5. Prove the identity in [ P §8.1, (33)]. 


Problem 8.1.6. Prove the relations (i), (ii) and (iii), formulated at the end of 
[P §8.1, 8]. 


Problem 8.1.7. Can one conclude from the Markov property [ P §8.1, (3)] that for 
any choice of the sets Bo, B1, ..., Ba, B € &, with P{Xo € Bo, Xi € By,...,Xn € 
Ba} > 0, one must have: 


P(Xn+1 e B | Xo € Bo, Xi € Bigg Xn € Bn) = P(Xn+1 Ee B | Xn (a B,)? 


Problem 8.1.8. Consider a cylindrical piece of chalk of length 1. Suppose that 
the piece is broken “randomly” into two pieces. Then the left piece is broken at 
“random” into two pieces—and so on. Let X,, denote the length of the left piece after 
the n® breaking, with the understanding that Xo = 1, and let F, = o(X,,..., Xp). 
Thus, the conditional distribution of X,4; given X, = x must be uniform on (0, x]. 

Prove that the sequence (X,,),>9 forms a homogeneous Markov chain. In 
addition, prove that for every a > —1 the sequence 


M, =(1+a)"X", n>0, 


A.N. Shiryaev, Problems in Probability, Problem Books in Mathematics, 329 
DOI 10.1007/978-1-4614-3688-1_8, 
© Springer Science+Business Media New York 2012 


330 8 Sequences of Random Variables that Form Markov Chains 


forms a non-negative martingale. Prove that with Probability 1 for every 0 < p < e 
one has 
lim p"X, = 0, 


and for every p > e one has 
lim p' Xn = &. 


(Given that “on average” every piece is broken in half, one may expect that X, 
would converge to 0 as 2”. However, the property lim, p” X, = 0 (P-a. e.) implies 
that X,, converges to 0 much faster—“almost” as e~”.) 


Problem 8.1.9. Let &,&,... be any sequence of independent and identically 
distributed random variables, which can be associated with a Bernoulli scheme, i.e., 
P{E, = 1} = P{§, = —1} = 5,n > 1, and let So = 0, S, = & +--+ & and 
M, = max{S,:0<k <n},n>1. 

(a) Do the sequences (|Sy|)n>0, ((Mn|)n>0 and (Mn — Sn)n>0 represent Markov 
chains? 

(b) Are these sequences going to be Markov chains if So = x 4 0 and S, = 
x+é +t... +E? 


Problem 8.1.10. Consider the Markov chain (X,)n>0, with state space E = 
{—1,0, 1}, and suppose that p;; > 0, fori, j € E. Give necessary and sufficient 
conditions for the sequence (|X, |)n>0 to be a Markov chain. 


Problem 8.1.11. Give an example of a sequence of random variables X = 
(Xn)n>0, Which is not a Markov chain, but for which the Chapman—Kolmogorov 
equation nevertheless holds. 


Problem 8.1.12. Suppose that the sequence X = (X,,),>0 forms a Markov chain in 
broad sense, and let Y, = X,+1; — Xn, for n > 0. Prove that the sequence (X, Y) = 
((Xn)n20, (Yn)n>0) is also a Markov chain. Does any of the following sequences 
represent a Markov chain: (Xn , Xn+1)n>0, (Xon )n>0, (Xn+k)n>0 for k Žž 1? 


Problem 8.1.13. We say that a sequence of random variables X = (X),)n>0, in 
which every X, takes values in some countable set Æ, forms a Markov chain of 
orderr > 1, if 


P(Xn+1 = in41|Xo0 = ipes Xn = in) = 
= P (Xni = inti | Xn r+ = İn reassess An = la); 


for all io, ... inp, A Z F. 
Assuming that X = (X,)n>o0 is a Markov chain of order r > 1, let X, = 
(Xn, Xn+1, -<-s Xn+r—1), n > 0. Prove that the sequence X = (X n)n>0 represents 


a canonical (i.e., of order r = 1) Markov chain. 
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Problem 8.1.14. (Random walk on groups.) Let G be some finite group, endowed 
with binary operation @ , so that the usual group properties hold: 


(i) x,y € G implies x ® y € G; 

Gi) if x, y,z € G, then x ® (y 9z) = (xx ® y) ®z; 

(iii) there is a unique e € G, such that x Be = e @x = x, forall x € G; 

(iv) given any x € G, there is an inverse —x € G, which is characterized by the 
property x ® (—x) = (-x) @x =e. 


Let &, &, &,... be any sequence random elements in G, which are identically 
distributed with law Q(g) = P{é, = g}, g €G,n>= 0. 

Prove that the random walk X = (X),)n>o0, given by X, = & @®& ®... @&, 
forms a Markov chain and give the respective transition probability matrix. 


Problem 8.1.15. (Random walk on a circle.) Let &,&,... be any sequence of 
independent random variables that are identically distributed in the interval [0, 1], 
with a (common) continuous probability density f(x). For a fixed x € [0,1), 
consider the sequence X = (X,)n>0, given by Xo = x and 


Xnr =x +é +... +E (mod 1). 


Prove that X = (X,)n>0 is a Markov chain with state space E = [0, 1). Find the 
transition function for this Markov chain. 


Problem 8.1.16. Suppose that X = (Xn)n>0 and Y = (Y,)n>o are two inde- 
pendent Markov chains, defined on the same probability space (2, .F,P), taking 
values in the same countable space E = {i, j,...}, and sharing the same transition 
probability matrix. Prove that, for any choice of the initial values, Xp = x € E and 
Yo = y € E, the sequence (X, Y) = (Xn, Yn)n>o forms a Markov chain. Find the 
transition probability matrix for this Markov chain. 


Problem 8.1.17. Let X1, X2,... be any sequence of independent and identically 
distributed non-negative random variables, that share a common continuous distri- 
bution function. Define the record moments: 


A, =1, Ay = inffn > Ky: Xn > max(X,...,Xn—-1)}, k22, 
and prove that Z = (Zk)k>ı is a Markov chain. Find the transition probability 
matrix for this Markov chain. 


Problem 8.1.18. Let X1, X2,... be some sequence of independent and identically 
distributed non-negative random variables that share the same discrete range of 
values. Assuming that the record times # = (#;)x>1 are defined as in the previous 
problem, prove that the associated sequence of record values V = (Vp)k>1, with 
Vk = Xg,, forms a Markov chain. Find the transition probability matrix for this 
Markov chain. 
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Problem 8.1.19. (Time reversibility for Markov chains.) Suppose that X = 
(Xn)o<n<n is some irreducible Markov chain with a countable state space EF, 
with transition probability matrix P = || p;; ||, and with invariant initial distribution 
q = (qi), such that q; > 0 for alli € E (for the definition of invariant distribution 
see [ P §8.3, 1]). 


Next, consider the sequence XO) = (X,)ocn<n, given by X, = Xn-n 
which is nothing but the sequence X in reverse time. Setting P = Pi; ||, where 
Pi; = Pji, prove that the matrix P is stochastic. In addition, prove that X XW) 


is a Markov chain with transition matrix P. 


Remark. The Markov property comes down to saying that, conditioned to the 
“present”, the “past” and the “future” are independent—see [ P §8.1, (7)]. Because 
of this symmetry between past and future, one is lead to suspect that the Markov 
property of the sequence X = (X),)o<n<n may be preserved under time reversal, 
provided that in reverse time the initial distribution is chosen in a certain way. The 
statement in this problem makes this idea precise: the Markov property is preserved 
under time-reversal, possibly with a different transition probability matrix, provided 
that the initial distribution is chosen to be the invariant one. 


Problem 8.1.20. (Reversible Markov Chains.) Let X = (Xn)n>o0 be any Markov 
chain with countable state space E, with transition probability matrix P = || p;; |l, 
and with invariant distribution q = (q;). We say that the (q, P)-Markov chain X = 
(Xn )n>0 is reversible (see, for ` example, [22]) if, for every N > 1, the sequence 
XN) = (CX Gosnen: given by X, = = Xy-n, is also a (q, P)-Markov chain. 

Prove that an irreducible (q, P)-Markov chain is reversable if and only if the 
following condition holds: 


qiPij = jP jis for alli, j EE. 


Convince yourself that, if the distribution A = (A;) (A; > 0, $. å; = 1) and the 
matrix P satisfy the balance equation 


Api =Apji, i,j €E, 
then A = (A;) coincides with the invariant distribution q = (q;). 


Problem 8.1.21. Consider the Ehrenfests’ model (see [ P §8.8, 3 ]) with stationary 
distribution q; = Cį (1/2), i = 0,1,..., N , and prove that the following balance 
equation is satified: 


qi Pii+1 = ]i+1Pi+1i- 
(Note that in this model p;; = 0, if |i — j| > 1.) 
Problem 8.1.22. Prove that a Markov chain with transition probability matrix 
0 2/3 1/3 


P=([1/3 0 2/3 
2/3 1/3 0 
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has invariant distribution q = (1/3,1/3,1/3). Convince yourself that, for any 
N > 1, the sequence XN) = (Xn-n)o<n<n forms a Markov chain with transition 
probability matrix P, which is simply the transpose of P. Argue that the chain 
X = (X;,)n>0 is not reversible and give the intuition behind this feature. 


Problem 8.1.23. Let X = (Xy)n>o be any stationary (in strict sense) and non- 
negative Gaussian sequence. Prove that this sequence has the Markov property if 
and only if the covariance COV(X,,, Xn+m), m,n > 0, has the form: 


cov(X,,, Xn+m) = op", 


for some choice of o > O and —1 < p < 1. 


8.2 The Strong and the Generalized Markov Properties 


Problem 8.2.1. Prove that the function y(x) = E, H, introduced in the Remark in 
[P §8.2, 1] is &-measurable. 


Problem 8.2.2. Prove the relation in [ P §8.2, (12)]. 
Problem 8.2.3. Prove the relation in [ P §8.2, (13)]. 


Problem 8.2.4. Are the random variables X, — Xr^an and X;,n, from the Example 
in [P §8.2, 3], independent? 


Problem 8.2.5. Prove the formula in [ P §8.2, (23)] 


Problem 8.2.6. Suppose that the space E is at most countable, let (2, F) = 
(E®, &°), and let 0n: 2 — 2,n > 1, denote the usual shift operators 


0,(@) = (Xai Mpbty ees) w = (Xo, X1,.--). 


Let X = (X,(@))n>o0 be the canonical coordinate process on §2, defined as 
X,(@) = Xn, @ = (Xo, X1,...), forn > 0. 
Given any ¥-measurable function H = H(q), set (see [P §8.2, (1)]) 


(H 0 6,)(@) = H(O,(@)), n21, 
and, given any B € ¥, set (comp. with [ P §5.1, Definition 2]) 
O7 (B) = {@: 0,(@) € BY, n>1. 


With the above definitions in mind, prove the following properties: 
(a) For any m > 0 andn > 1, one has 


Xm © On = Xm+n 


Le., (Xm o On)(@) = Xm+n(@). 
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(b) For any m > 0 andn > 1, one has 
O7 {Xm € A} = {Xm 0 On € A} = {Xin € A} 
i.e., for every A € &, 
07 {o : Xm(w) € A} = {w : (Xm © On)(w) € A} = {w : Xm+n(@) € A}; 
and, more generally, 


07! {Xo € Ap, ... sAm €E Am} = {Xp 0 On € Ao,..- > Xm ° On € Am} 
= {Xn € Ao, re) Xm+n € Am}- 


In addition, prove that 
O, (Fn) = 0 (Xn, ---, Xmtn), (*) 


with the obvious meaning of the symbols 07! (Fm) and a(X;,..., Xm+n) (explain). 


Problem 8.2.7. Adopt the notation introduced in Problem 8.2.6, H = H(qw) be 
any F -measurable function on (2, F), and let A € A(R). Prove that 


(H 0 6,)~'(A) = 6, '(H7'(A)). (**) 


Problem 8.2.8. Adopt the notation introduced in Problem 8.2.6 and let t = t(@) 
be some stopping time (i.e., a finite Markov moment) on (2, F, (Fk)k>0), where 
Fy = o(Xo, Xı,..., Xk), k > 0. Based on (**) and (*) in Problems 8.2.7 
and 8.2.6, prove that, for any given n > 0, the moment n + T o 6, is also a stopping 
time, i.e, {@:n+(t06,)(@) = m} € Fn, for every m > n. 


Warning: Problems 8.2.9-8.2.21 below assume the notation and the terminology 
introduced in Problems 8.2.6 and 8.2.8. 


Problem 8.2.9. Let o = o(w) be any stopping time on (2, F, (Fk)k>o) and let 
H = H(q) be any ¥-measurable function on 2. The symbol (H o 6,)(@) is 
understood as the function H(65(4)(@)), i.e., H(@,(@)), for w € {w : o(w) = n}. 
As a generalization of Problem 8.2.8, prove that o + t o 6, is also a stopping time. 


Problem 8.2.10. Given any two stopping times, t and o, on (92, F, (Fx)x>0), 
the random variable X, o 6, will be understood as X7(6,(«))(8o(@)), i.e., as 
X7(6,(0))(On(@)), for @ € {w : o(@) = n}, for any n > 0. As a generalization 
of the property Xm 0 On = Xm+n from Problem 8.2.6, prove that 


xX, o Oo = X 106, +o- 


8.2 The Strong and the Generalized Markov Properties 335 


Problem 8.2.11. Given any set B € £, let 
Tg(@) = inf{n > 0: X,(@) € B} and og(w) = inf{n > 0: X,(@) € B} 


denote, respectively, the time of the first and the time of the first after time 0 visit of 
the sequence X to the set B. Suppose that the times tg (œw) and og (æ) are finite for 
allw € 2, and let y = y (w) be any stopping time on (92, F, (Fx )x>0). 

Prove that tg and og are stopping times and, furthermore, 


Y + Tg 08, =inf{fn>y:X,¢€ B}, y +0g o0, = inffn >a: X, € B}. 


Argue that, after appropriate change in the respective definitions, the above relations 
remain valid even in the case where the stopping times y, tg and og may take 
infinite values and the sets in the right sides may be empty. 


Problem 8.2.12. Let t and o be any two Markov times. Prove that v = 106, +0, 
with the understanding that v = oo on the set {o = oo}, is also a Markov time. 


Problem 8.2.13. Prove that the strong Markov property [P §8.2, (7)], from 
[P §8.2, Theorem 2], remains valid for every Markov time t < ov, and can be 
expressed as 


E, [I(t < œ)(H o 6,) | FŽ] =I(t<oo)Ey,H (P,-a.e.). 


(Recall that H is a bounded and non-negative F -measurable function and Ey, H is 
a random variable of the form yY (X+), where (x) = E, A.) 

In addition, prove that, if K = K(w) is some .#,-measurable function and H 
and K are either bounded or non-negative, then, for every Markov time t < oo, one 
has 

E,[U(t < 00)K)(H 0 0,)] = Exl(U(t < 00) K)Ex, H]. 


Problem 8.2.14. Prove that the sequence (Xzan,Px)n>0, x € E, introduced in 
[P §8.2, 3], is a Markov chain. Does this property hold for an arbitrary Markov 
chain (with countable state space) and for an arbitrary Markov time of the form 
t = inf{n > 0: X, € A}, for some choice of the set A C E (comp. with [P §8.2, 
(15)])? 


Problem 8.2.15. Let h = h(x) be a non-negative function and let H(x) = 
(Uh)(x) be the potential of h (see Sect. A.7). Prove that H(x) is the minimal 
solution of the equation V(x) = h(x) + TV(x), within the class of non-negative 
functions V = V(x). 


Problem 8.2.16. Given any y° € E, prove that the Green function G(x, y°) is the 
minimal non-negative solution to the system 


1+7TV(x), x=y°%, 


a TV(x), x#y?®. 
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Problem 8.2.17. Prove that if t and o are any two Markov times and T,,, n > 0, 
are the transition operators associated with X = (X7,)n>0, then: 


ToT; = Ty +106, + 
Hint. Use the strong Markov property and the identity X, o 6, = Xro0, +0, 


established in Problem 8.2.10. 
Problem 8.2.18. Given any domain D € @, let 


t(D) = inf{fn > 0: X, € D} and o(D) = inf{fn > 0: X, € D}. 
Prove that 


Xotib) = XD), On {a(D) < ov}, 
Ta) = T Tp). 


Problem 8.2.19. With the notation introduced in the previous two problems, let 
g > Oand Vp(x) = T,p)g(x). Prove that Vp(x) is the smallest non-negative 
solution to the system 


g(x), xéeD, 


V(x) = 
TV(x), xD. 


In particular, if g = 1, then the function Vp(x) = P,{t(D) < oo} is the smallest 
non-negative solution to the system 


1, xéeD, 


V(x) = 
TV(x), xD. 


Problem 8.2.20. By using the strong Markov property, prove that the function 
mp(x) = E,t(D) solves the system: 


0, xe D, 


V(x) = 
1+7TV(x), x ¢D. 


In addition, prove that mp(x) is the smallest non-negative solution to the above 
system. 


Problem 8.2.21. Prove that any non-negative excessive function f = f(x) admits 
the Riesz decomposition: 


f(x) = f(x) + U(x), 
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in which 


f(x) = lim(T, f)@). 


is a harmonic function and U h(x) is the potential of the function 


h(x) = f(x) — Tf(x). 


Problem 8.2.22. Let X = (X,)n>1 and Y = (Yn)n>1 be any two independent 
Markov chains, with the same state space E = {1,2} and the same transition 
probability matrix (,%, 's), for some choice of œ, 8 € (0,1). Let t = inf{n > 
0: Xn = Yn} (with inf Ø = oo) be the time of the first meeting between X and Y. 
Find the probability distribution of the time Tt. 


Problem 8.2.23. Let X = (Xn, Fa)n>0 be any stochastic sequence and let B € 
A(R). As was already established, the random variables tg = inf{n > 0: X, € B} 
and og = inf{n > 0: X, € B} (with inf Ø = oo) are Markov times. Prove that, for 
any fixed integer N > 0, the last visit of B between times 0 and N, i.e., the random 
variable 

yg =sup{0<n<N:X, E B} with (sup Ø = 0) 


is not a Markov time. 


Problem 8.2.24. Prove that the statements in Theorems 1 and 2 in [ P §8.2] remain 
valid if the requirement for the function H = H (œw) to be bounded is replaced by 
the requirement that this function is non-negative. 


Problem 8.2.25. Let X = (Xn, Fn)n>0 be any | Markov sequence and let t be any 
Markov time. Prove that the random sequence X = (Xn, F n)n>0, with 


Xnr = n+t and Fy = Farti 


is also a Markov sequence, which, in fact, has the same transition function as the 
sequence X. (This fact may be seen as the simplest form of the strong Markov 


property.) 


Problem 8.2.26. Let (X1, X2,...) be any sequence of independent and identically 
distributed random variables, with common distribution function F = F(x). Set 
Fo = {O,2}, Fa = 0 (X1,..., Xn) n = 1, let t be any Markov time for (Fna )n>0, 
and let A € F. 

Assuming that t is globally bounded, i.e, 0 < t(w) < T < œ, for w € 2, prove 
that: 

(a) The variables 74, X}+,, X247,... are independent. 

(b) The variables X,,4, share the same distribution function F = F(x), i.e., 
Law(X,,4,) = Law(Xı), n > 1. 


(One consequence from (a) and (b) above is that the probabilistic structure of 
the sequence (X1+r, X2+7,...) is the same as the probabilistic structure of the 
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sequence (X1, X>,...), 1.e., Law(X14,, X247,...) = Law(X), X2, .. .); plainly, the 
distribution of the sequence (X,,),>1 is invariant under the random shift n ~» n + T). 

Suppose now that t is any, i.e., not necessarily bounded, Markov time, with 0 < 
Tt < œ. Prove that, in this case, property (a) can be written in the form: 


PAM {r < oo}; X47 = Xise sA ntr = Xn) 


= P(AN {t < oo}) F(x) see F(X), 


which relation must hold for all n > 1 and x, € R. 
Hint. It is enough to notice that 


P(A N {t < oo}; Xir < Xipe Anpe S Xn) = 


[0.0] 
SS P(A N {r = k}; Xir < X1,- ee < Xn), 
k=0 


where the events A N {t = k} and {X44 < X1,..., Xn+k < Xn} are independent. 


8.3 Limiting, Ergodic and Stationary Distributions 
of Markov Chains 


Problem 8.3.1. Give examples of Markov chains for which the limit 7; = 
lim, pe exists and 

(a) Does not depend on the initial state 7. 

(b) Does depend on the initial state 7. 


Problem 8.3.2. Give examples of ergodic and non-ergodic chains. 


Problem 8.3.3. Give an example of a Markov chain that has a non-ergodic 
stationary distribution. 


Problem 8.3.4. Give an example of a transition probability matrix for which any 
probability distribution on the respective state space is a stationary distribution. 


8.4 Markov Chain State Classification Based 
on the Transition Probability Matrix 


Problem 8.4.1. Formulate the notions of “essential” and “inessential” states (see 
[P §8.4, 1]) in terms of the transition probabilities p, i,jeE,n>l. 
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Problem 8.4.2. Let P be the transition probability matrix for some irreducible 
Markov chain, and suppose that P has the additional property that P? = P. Describe 
the structure of the matrix P. 


Problem 8.4.3. Let P denote the transition probability matrix for some finite 
Markov chain X = (X,„)n>0. Suppose that 0),02,... is some sequence of inde- 
pendent and identically distributed non-negative integer-valued random variables, 
which are also independent from X. Let t = 0 and Tn = 0, + + On, n = 1. 
Prove that the sequence X= (Xizi given by X, = X,, is a Markov chain 
and find the transition probability matrix P for this chain. Prove that if the states i 
and j communicate for the chain X, then these two states must communicate also 
for the chain X. 


Problem 8.4.4. Consider the Markov chain X = (X,,)n>0, with state space E = 
{0, 1}, and suppose that its transition probability matrix is given by P = ( i= B 7” ), 
for some choice of œ, 6 € (0, 1). Then define the Markov moment 


v = inf{n > 1: X,-) = X, = 0} 


and prove that 
E 2— (œ +) 
ov = — a 
a(l — p) 
Problem 8.4.5. Consider the Markov chain with state-space E = {1,2,3} and 
transition probability matrix 


a l-a 0 
P=| 0 £p 1-8], 
l-y 0 y 


where œ, 8, y € (0, 1). Prove that this Markov chain is irreducible. What can be said 
about the existence of a stationary distribution for this Markov chain? 


Problem 8.4.6. Explain whether it may be possible for all states of a given Markov 
chain to be inessential in each of the following two cases: 


1. The state space is finite. 
2. The state space is countably infinite. 


8.5 Markov Chain State Classification Based 
on the Asymptotics of the Transition Probabilities 


Problem 8.5.1. Prove that an irreducible Markov chain, with state space 
{0,1,2,...} and with transition probabilities p; j, is transient if and only if the 
system of equations uj = )°; ui pij, j = 0,1,..., admits a bounded solution uj, 
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j =0,1,..., which is not constant (i.e., 
least one pair (i, j )). 


; Æ u;, for at 
É üj 


Problem 8.5.2. Prove that, in order for an irreducible Markov chain, with state 
space {0, 1,2,...} and with transition probabilities p; j, to be recurrent, it is enough 
to establish the existence of a sequence (uo, u1, . . .), with lim; —oo uj = 00, for which 


uj = Da. Ui Pij, for all j #0. 


Problem 8.5.3. Prove that an irreducible Markov chain, with state space 
{0,1,2,...} and with transition probabilities p; j, is positive recurrent if and 
only if the system of equations u; = >>; ui pij, j = 0,1,..., admits a solution uj, 
j =0,1,..., with 0 < Ð, |u;| < oo. 


Problem 8.5.4. Consider a Markov chain with state space {0,1,...} and with 
transition probabilities 


Poo = fo, Poi = Po > 9%, 


pi > 9, j=it+l, 
ri > 0, J 


~. 


qi > 0, J =i- 1, 
0 in all other cases. 


ae 


Setting pọ = 1 and pm = , prove the following statements: 


the chain is recurrent <> 5 Pm = &; 


the chain is transient <> x Pm < 03 
me - 1 
the chain is positive recurrent <> Pm = œ, < OO; 
Pm Pm 
_ 1 
the chain is null recurrent <> 5 Pm = &, = 00. 
Pm Pm 


Problem 8.5.5. Prove that 
Six = fi Sik: 


(n) (n) 
pa < ar ; 


n=1 


Problem 8.5.6. Prove that, for any Markov chain with countable state space, the 
Cesaro limits of the n-step transition probabilities po always exist, and one has 


lim — 2 p® = 
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Problem 8.5.7. Let 71, 72,... be any sequence of independent and identically 
distributed random variables, with P{ņą = j} = pj, j = 0,1,..., and suppose 
that the Markov chain &, €;,... is chosen so that & +; = (&)* + me41,k > 0. 
Compute the transition probabilities for this Markov chain and prove that, if po > 0 
and po + pi < 1, then the chain would be recursive if and only if X`, kp, < 1. 


Problem 8.5.8. Let o; = inf{n > 0: X, = i} (with inf Ø = oo) and then define 
o;' recursively through the relations: 


E jor" +6;°0 1, ifa?! < oo, 
f= i 
l 


00, if or~! 


= OO. 
Prove that 
P;{0; < co} = (Pi{o; < œ)" (= fi) 


Problem 8.5.9. Let Ng; denote the number of visits of a particular Markov chain 
to the state 7. 
(a) Prove that 


Bae 1 1 
i D 1—Pifa; < 00} 1- fii} 


(b) Reformulate the criteria for recurrence and transience of the statei € E from 
[P §8.5, Theorem 1] in terms of the average number of visits E; Ng;,. 
(c) Prove that 
E; Nij} = Pi{o; < co} 5 E; Ngy. 


Problem 8.5.10. (Necessary and sufficient condition for transience.) Let X = 
(Xn)n>o0 be some irreducible Markov chain with countable state space E and 
transition probability matrix || pxy ||. Prove that the chain X is transient if and only 
if there is a nontrivial and bounded functions f = f(x) and a state x° € E, for 
which one can claim that 


f@)= È Pafo) x #x°, 
yFx° 
(harmonicity on the set E \ {x°}). 
Problem 8.5.11. (Sufficient condition for transience.) Let X = (Xn)n>o0 be some 


irreducible Markov chain with countable state space Æ. Suppose that there is a 
bounded function f = f(x), such that, for some set B C R, one has 


f(x°) < h(x), for some x° € B andall x € B, 


and 


X pofo f, xeB (=E\B) 


yEE 


(superharmonicity on the set B). 
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Prove that if the above condition holds, then the chain must be transient. 


Problem 8.5.12. (Sufficient condition for recurrence.) Let X = (Xn)n>0 be some 
irreducible Markov chain with countable state space Æ. Suppose that there is a 
function h = h(x), x € E, with the property that, for any constant c, one can 
claim that the set Be = {x : h(x) < c} is finite and, for some finite set A C R, 
one has 

Y poho) < h(x), x EA 


yEE 


(superharmonicity on the A ( = E \ A)). 
Prove that the chain X is recurrent. 


Problem 8.5.13. Prove that the sufficient condition formulated in the previous 
problem is also necessary. 


Problem 8.5.14. Let (&,),>1 be any sequence of independent and identically 
distributed random variables, and let X = (X,)n>1 be the random walk defined 
as Xo = 0 and X, = & +... + én, forn > 1. Let U(B) = ENpg denote the 
expected number of the visits, Ng = }_„>ọ Ig (Xn), of the random walk X to the 
set B. The set function U(- ) is called potential-measure (in this case, for the starting 
point x = 0)—-see Sect. A.7. 

Analogously to the definitions of transience and recurrence for Markov chains 
with countable state spaces (see Definitions 1 and 2 in [P §8.5, 2]), we will say 
that the random walk X, which, in general, lives in the space R, is recurrent if 


UT) = œ, 
and will say that it is transient if 
U(I) < œ, 


for every finite interval I C R. 
Assuming that the expectation E£; is well defined, prove that one of the following 
three properties always holds: 


1. X, — oo (P-a. e.) and the random walk X is transient; 
2. X, —> —oo (P-a. e.) and the random walk X is transient; 


3. lim X, = —on, lim X,, = +00, i.e., the random walk oscillates between —oo 
and ++oo, in which case transience and recurrence are both possible. 


Problem 8.5.15. Let everything be as in Problem 8.5.14 and again suppose that the 
expectation u = E&, is well defined. Prove that: 


1. If 0 < u < œ, then X, > oo (P-a. e.). 
2. If —co < u < 0, then X,, > —oo (P-a. e.). 


3. If u = 0, then lim X, = —oo, lim ¥, = +00 and the random walk is 
recurrent. 
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Problem 8.5.16. Prove that a necessary and sufficient condition for the random 
walk X = (X),)n>0 to be transient is that |X„| > oo as n > oo with probability 1. 


Problem 8.5.17. Consider the Markov chain X = (X,,)n>0 with transition proba- 
bilities p;j, i, j € E = {0, +1, +2, . . .}, chosen so that p;; = Oif |i — j| > 1, i.e., 
for every i € E one has 


Pii—1 + Pii + Pii41 = 1, 


all probabilities in the left side being strictly positive. 

Prove that any such chain must be irreducible and aperiodic. Under what 
conditions for the transition probabilities (p;i, Pi i—1, Pii+1;i € E), is the Markov 
chain X transient, recurrent, positive recurrent and null-recurrent (comp. with 
Problem 8.5.4)? 

Hint. Write down the recursive rule that governs the probabilities Vii) = 
P;{tjo = oo}, i € E, for any fixed j°. 


Problem 8.5.18. (On the probability for degeneracy in the Galton—Watson model.) 
In their study of the extinction of family names in England, in the late nineteenth 
century F. Galton and H. W. Watson proposed the following model, which carries 
their names: 

Let £o, &1, &,... be some sequence of random variables that take values in N = 
{0,1,2,...} and can be written as random sums of random variables: 


En41 = m+... +02, n> 0. 


(Comp. with [P §1.12, Example 4].) Suppose further, that the family {n®, i>, 
n > 0} is comprised of independent random variables, every one of which is 
distributed as the random variable ņ, chosen so that P{ņn = k} = py, k > 0, 
and $`?Co pk = 1. In this model, each &, represents “the number of parents” in 
the n'"-generation on the family tree, while each n” represents the “the number 
of offsprings,” produced by the i" parent. Thus, €+; is exactly the number of 
offsprings that comprise the (n + 1)* generation, with the understanding that if 
En = 0, then & = 0 forall k > n. 

Let t = inf{n > 0: &, = 0} denote the time of extinction for the family, with 
the understanding t = oo, if & > 0 for all n > 0. The main question is how to 
calculate the probability for extinction in finite time, namely the probability 


q = P{t < oo}. 


It turns out that the most efficient method for calculating the above probability is 
the method of generating functions (see Problem 2.6.28). Consider the generating 
functions g(s) = Es” = °°, pks“, |s| < 1, and f,(s) = Es®, n > 1, and prove 
the following properties of the Galton—Watson model: 
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(a) fal) = faa lE) = fn—2(8(g(s)) = -= fo(g(s)), where g™ (s) = 
(go...0g)(s) (n times); 

(b) if & = 1, then fo(s) = s and f(s) = g™ (s) = g(fn-1(s))s 

(c) fr(O) = PLE, = 0}; 

(d) {En = 0} C {Enti = Of; 

(e) Pit < cof = P(Uitén =0}) = limy>œPfEn = 0 = 
limy—+o0 fn (0); 

(f) if g = P{t < oo}, then q is one of the roots of equation x = g(x), 0< x <1. 


Problem 8.5.19. (Continuation of Problem 8.5.18.) Let g(s) = Es” be the gener- 
ating function of the random variable 7, which takes values in the set {0,1,2,...}. 
Prove that: 

(a) The function g = g(s) is non-decreasing and convex on [0, 1]. 

(b) If P{7 = 0} < 1, then the function g = g(s) is strictly increasing. 

(c) If P{7 < 1} < 1, then the function g = g(s) is strictly convex. 

(d) If P{ņn < 1} < 1 and En < 1, then the equation x = g(x), 0 < x < 1, has 
unique solution q € [0, 1]. 

(e) If P{7 < 1} < 1 and En > 1, then the equation x = g(x), 0 < x < 1, has 
two solutions: x = 1 and x = q € (0,1). 

Hint. Show that g'(x) > 0 and g”(x) > 0, x € [0, 1], and consider separately 
the graphs of the function g = g(x) in the case En < 1 and in the case En > 1. 


Problem 8.5.20. (Continuation of Problems 8.5.18 and 8.5.19.) Consider the 
Galton—Watson model with En > 1 and prove that the probability for extinction 
q = P{t < oo} can be identified with the only root of the equation x = g(x) that 
is located strictly between 0 and 1, i.e., 


Ey >1 > 0<P{t<oo}<l. 


If En < 1 and p; Æ 1, then the probability for extinction occurs with probability 1, 
L.e., 
En <1 => P{t <oo}=1. 


Problem 8.5.21. Consider the Galton—Watson model with pı<1. Prove that for 
every fixed kK>1 one has P{&, = k i.o.}=0. Conclude that P {lim, & € 
{0, cof} =1. 


Problem 8.5.22. Consider the Markov chain X = (X,„)n>0, with countably infinite 
state space E = {1,2,...}, and suppose that all states are inessential. Prove that 
each of the following conditions is necessary and sufficient for the chain to be 
irreducible and recurrent: 

(a) fi; = 1 forall i, j € E (ie., Pi{o(j) < co} = 1, where o(/) = inf{n > 0: 
Xn = jÐ. 
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(b) Every finite and non-negative function h = h(i), i € E, which is excessive 
for the chain X (i.e., h(i) > ick pih(j), i € E, where p;j are the transition 
probabilities for X), must be a constant. 

Hint. The necessity of (b) is established in Sect.A.7 in the Appendix. To 
establish the sufficiency, prove that for any 7, j € E one must have 


fij = Pj + X Pik fey» 
k#j 
and conclude that if all excessive functions are constants, then fj; = 1 forall i, j € 


E, which, according to (a), is equivalent to the claim that the chain is irreducible 
and recurrent. 


8.6-7 On the Limiting, Stationary, and Ergodic Distributions 
of Markov Chains with at Most Countable State Space 


Problem 8.6-7.1. Describe the limiting, stationary and ergodic distributions of the 
Markov chain with transition probability matrix 


1/2 0 1/20 
0 0 01 
1/4 1/2 1/40 
0 1/21/20 


Problem 8.6-7.2. Let P = || p;; || be some mxm-matrix (m < 00), which is doubly- 
stochastic (i.e., Ji Pi; = l, fori = 1,...,m, and ẹṣ- p = 1, for j = 
1,...,m). Prove that the uniform distribution Q = (1/m,..., 1/m) is stationary 
for the associated Markov chain. 


Problem 8.6-7.3. Let X = (Xn)n>0 be some Markov chain with state space E = 


{0, 1} and with transition probability matrix P = (2 B a ), for some choice of 
O0O<a<land0<6<1. 
Prove that: 
(a) 
m= (18) E l-a Bee 
2—(@+B) \1-f1-a) " 2-(@+B) \-1-f) 1-8 }’ 


(b) if the initial distribution is m = (x (0), 7(1)), then 


1-B 


P.{X, = 0} = 2-(@+A) 


+(@+6—"|mo)- E5] 
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Problem 8.6-7.4. (Continuation of Problem 8.6-7.3.) Find the stationary distribu- 
tion, x°, of the Markov chain X and calculate the covariance 


COVz° (Xn, Xn+1) = Ero Xn Xn+1 = Ero Xn Ezo n+l- 
Setting S, = Xı +---+ Xn, prove that 
n(l—&) 


Eze Sa B a 
2—(a+ B) 


and D,oS, <cn, 
where c is some constant. 
Finally, prove that almost surely (with respect to any of the measures Po, Pı and 
P.,c) one has 
Sn l-a 
— 
2— (œ + f) 


Problem 8.6-7.5. Let P = ||p;; || be a transition probability matrix (i, j € E = 
{0,1,2,...}), chosen so that for any i € E \ {0} one has pji41 = pi and pio = 
l — p;, for some 0 < p; < 1, and fori = 0 € E one has p;o = 1. 

Prove that all states of the associated Markov chain would be recurrent if and 
only if lim, [ [}=; pj = 0 (or, equivalently, Esa — pj) = 00). 

Show also, that, if all states are recurrent, then all states can be claimed to be 
positive recurrent if and only if 


asn —> Ww. 


coo k 
STP: <o 
k=1 j=1 
Problem 8.6-7.6. Prove that if X = (Xķ)k>o is some irreducible and positive 
recurrent Markov chain, with invariant distribution 7°, then, for every fixed x € E, 
one has (P-a. e. for every initial distribution 7) 
z=] 
ia X Io (Xe) —> m°(x) asn — o, 
n 
k=0 


and 
=] 


1 
— D => m°’(x) asn —> œ, forevery y € E. 
A y 

k=0 


(comp. with the law of large numbers from [P §1.12].) 
In addition, prove that if the chain is irreducible and null recurrent, then one has 
(P,,-a.e., for every initial distribution 7) 


n—l 
-X lo) 0 asn > 00, 
n 

k=0 


and 
n—l1 


1 ; 
-5 p) —>0 asn— œ, forevery y € E. 
a ys 

k=0 
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Problem 8.6-7.7. Consider the Markov chain X = (X,,),>0, with finite state space 
E = {0,1,..., N}, and suppose that this chain is also a martingale. Prove that: 
(a) The states {0} and {N } must be absorbing (i.e., Poo = Pyn = 1). 


(b) If t(x) = inf{n > 0: X, = x}, then P,{t(V) < t(0)} = x/N. 


8.8 Simple Random Walks as Markov Chains 


Problem 8.8.1. Prove Stirling’s formula (n! ~ /2mn"+!/2e-") by using the 
following argument ([9, Problem 27.18]). Let S, = Xı +... + Xn, n > 1, where 
X1, X2,... are independent random variables, all distributed with Poisson law of 
parameter A = 1. Then: 


S, =n o n” faa—k\nk | te. 
o (AS) (aaa 
b) L Sn) > Law[N7] 

aw T aw , 


where N is some standard normal random variable; 


Sn —n 7 -_ 1 
(c) | ( zt) je Se 


(d) nl~ Vann" W2e™, 


Problem 8.8.2. Prove the Markov property in [ P §8.8, (28)]. 
Problem 8.8.3. Prove the formula in [ P §8.8, (30)]. 


Problem 8.8.4. Consider the Markov chain in the Ehrenfests’ model and prove that 
all states in that chain are recurrent. 


Problem 8.8.5. Verify the formulas [ P §8.8, (31) and (32)]. 


Problem 8.8.6. Consider the simple random walk on Z = {0, +1, +2,...}, with 
Pxx+1 = P, Px.x-1 = l — p, and prove that the function f(x) = (52) ,x EZ, 
is harmonic. 


Problem 8.8.7. Let £,...,&, be independent and identically distributed random 
variables and let S = & + ... + &,k < n. Prove that 


2 1(S_ > 0) £ min Lk <n: S= max Sj), 
k<n jn 
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where £ stands for “identity in distribution.” (This result, which is due to E. Sparre 
Andersen, clarifies why, in the Bernoulli scheme, the law of the time spent on the 
positive axis and the law of the location of the maximum are asymptotically the 
same as the arc-sine law—see [ P §1.10] and Problems 1.10.4 and 1.10.5.) 


Problem 8.8.8. Let &,&,... be a sequence independent Bernoulli random vari- 
ables with P{é, = 1} = P{&, = —1} = 1/2,n > 1. Setting So = 0 and 
Sa = E1 +... + En, prove that if t is any stopping time and 

Sn, MZT, 


Sn = Snr — (Sn Ta Snar) = 
28,;—Sn, n>T, 


then (Sn)n>0 = (Si)n>o0, i.e., the distribution laws of the sequences (Sn)n>0 
and (S,)n>o coincide. (This result is known as André’s reflection principle for 
the symmetric random walk (S;,),>0—comp. with other versions of the reflection 
principle described in [ P §1.10].) 

Problem 8.8.9. Suppose that X = (X,)n>0 is a random walk on the lattice Zi, 
defined by: Xo = 0 and 

1 
Xn =& +...+é, forn>1, and P{&; = e} = zg’ 


where the vector e = (e1,...,€q) € R? is chosen so that e; = 0,—1, +1 and 
le| = jei] +... + leg) = 1. 

Prove the following multivariate analog of the Central Limit Theorem, in which 
A stands for any open ball in R? centered at the origin 0 = (0,..., 0): 


X, d NË a? 
impf = eal -{ (=) ae dx,...dxq. 


Hint. Prove first that the characteristic function g(t) = Eei) t = (t),...,ta), 
is given by the formula g(t) = d7! ee cos(t;), and then use the multivariate 
version of the continuity theorem (see [ P §3.3, Theorem 1]) and Problem 3.3.5). 
Problem 8.8.10. Let X = (X;,)n>0 be the random walk introduced in Problem 8.8.9 
and let N, = ys; I(X, = 0) be the number of moments k € {0,1,...,1 — 1} at 
which X% = 0. It is shown in [P §7.9] that, ford = 1, one has EN, ~ 2n as 


n — oo. (In formulas [ P §7.9, (17) and (18)], one must replace = with 2) 
(a) Prove that for d > 2 one has: 


EN, ~ + Inn, d=? 
Ca, d > 3, 


where cg = 1/P{og = œ}, with og = inf{k > 0: Xy = 0} (og = co when the 
infimum is taken over the empty set). Calculate the values of the constant cy. 
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(b) Prove that for d = 2 one has: 


and 


Pío > n} = P{N, = 0} ~ = as n > ©. 
nn 


(c) Prove that for d > 3 one has, as n —> œ: 
P{X>, = 0} 
z: 
[1+ ER Pixa = 0] 


P{o = 2n} ~ 


(d) Prove that for d > 1 one has: 


1 
t+ Dra P{ Xx = oy 


P{o, = coo} = 


Remark. Property (d) is essentially established in the proof of [ P §8.5, Theo- 
rem 1]. It is also useful to notice that Pélya’s Theorem: 


P{o, <co}=1 ford = 1 andd = 2 (recurrence with probability 1); 


P{o, < co} < 1 ford > 3 (transience with positive probability), 


obtains directly from property (d), in conjunction with the asymptotic property 
P{X = 0} ~ $È, for d > 1, and with c(d) > 0. 


Problem 8.8.11. Consider the Dirichlet problem for the Poisson equation in the 
domain C C E, where E is an at most countable set, namely: find a non-negative 
function V = V(x), such that 
LV(x) =—h(x), x €C, 
V(x) = g(x), xe D=E\C, 


where A(x) and g(x) are given non-negative functions. 
Prove that the smallest non-negative solution Vp (x) for this problem is given by: 


t(D)-1 
V(x) = Ex[1(t(D) < 00)g(Xxpy)] + Ie@Ex | DI AX) |. 
k=0 


where t(D) = inf{n > 0: X, € D} (as usual, we suppose that t(D) = oo, if the 
infiumum is taken over the empty set). 
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Hint. Write the function Vp(x) in the form Vp(x) = p(x) + Wp(x), where 
D =C) 


yp(x) = Ex[1(c(D) < œ)8(Xz))], 


t(D)—1 


yoa) = DE | $ A(X) 
k=0 


Then observe that the functions p(x) and Yp (x) can be written in the form 
g(x) = Ip(x)g(x) + Ip) T goa), 
Wo(x) = Ip(x)h(x) + Ip) oe), 
and conclude from the last relations that 
Vo(x) = Ip(x)g(x) + Ip) (h(x) + TV), 


which implies that the above function gives a non-negative solution to the system: 
LV(x) = —h(x) in the domain C and V(x) = g(x), for x € D. 

To prove that V(x) > Vp(x), for every non-negative solution V(x) to this 
system, it is enough to notice that V(x) = Ip(x)g(x) + Ip(x)[A(x) + TV(x)], 
from where one finds that 


V(x) = In(x)g(x) + Ipha), 
and conclude by induction that 
Vix) = OUT Ug + Iph), 
k=0 


for every n > 0. This implies that 


V(x) = \OUpT") Ug + Iph) = gn(x) + Vox) = Vo(x). 


k>0 


Problem 8.8.12. Let X = (X,)n>o0 be a simple symmetric random walk on the 
lattice Z? and let 


o(D) = inffn>0: X, €e D}, DCZ?, 


assuming that the set D is finite. Prove that one can find positive constants c = c(D) 
and € = e(D) < 1, such that 


P,{o(D) =n} < ce", 


forall x € D. (Comp. with the inequality in [P §1.9, (20)].) 
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Problem 8.8.13. Consider two independent simple symmetric random walks, 
X! =(X})n>0 and X? = (X7),>0, that start, respectively, from x € Z and y €Z, 
and are defined on the same probability space (2, F, P). Let t!(x) = inf{n > 
0 : X! = 0} and r*(y) = inffn > 0: X? = 0}. Find the probability 
P{r!(x) < tO). 
Problem 8.8.14. Prove that for a simple symmetric random walk, X = (X,)n>0, 
on the lattice Z = {0, +1, +2,...} that starts from 0 € Z, one must have 
= ly —3/2 

Po{t(y) = N} ~ ——N as N > oo, 
27 
where t(y) = inf{fn > 0 : X, = y}, y # 0. (Comp. with the results in 
Problem 2.4.16.) 


Problem 8.8.15. Consider the simple random walk X = (Xn )n>0 with X, = x + 
Ei +... + én, where £1, &,... are independent and identically distributed random 
variables, with P{é&; = 1} = p, P{& = —1} = q, p +q = 1, andx € Z. Setting 
o(x) = inf{n > 0: X, = x}, prove that 


P,{o(x) < co} = 2 min(p, q). 
Problem 8.8.16. Consider the random walk introduced in the previous problem in 


the special case x = 0, and let Z, denote the total number of (different) integer 
values that appear in the set {Xo, X1,..., Xn} (note that Xo = 0). Prove that 


Rn 
Ey— > |p-—q| as n—> oœ. 
n 


Problem 8.8.17. Let X =(X,)n>0 be the simple random walk on Z = 
{0, +1, +2,...}, given by Xo = 0 and X, = & +... + énn > 1, for some 


sequence, £1, &,..., of independent and identically distributed random variables 
with P{é,; = 1} = p and P{é&, = —1} = q (= 1 — p), for some fixed 0 < p < 1. 
Prove that the sequence |X| = (|Xn|)n>o0 is a Markov chain with state space 


E = {0,1,2,...} and with transition probabilities 


pit! + git! 


— =l- piim, i>0, poi =1. 
pi +d 


Piitl = 


In addition, prove that 
pi 


P(X, =i | |X| = i, |Xn—1| = lpsiew iss |X1| = i1) = hl ay ca? 
P +g 


forn > 1. 


Problem 8.8.18. Let £ = (&,&1,&,...) be any sequence of independent and 
identically distributed random variables with P{€; = 1} = p and P{& = —1} = q, 
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p +4q = 1, and suppose that the sequence X = (X,,),>0 is defined as X, = 
En 141. Is this sequence a Markov chain? Is the sequence Y = (Y,,),>1, defined as 
Yn = 5(&-1 + n), n = 1, a Markov chain? 


Problem 8.8.19. Suppose that X = (X,,),>o0 is a simple symmetric random walk 
on the lattice Z = {0, +1, +2 ...}, which starts from 0, and let 01, 02, ... be the 
moments of return to 0, i.e., 0, = inf{n > 0: X, = 0}, 02 = inf{n > o; : X, = O}, 
etc. 

Prove that: 


(a) Po{o, < œ} = 1; 

(b) Po{X2n = 0} = Dip >1 Pofox = 2n}; 

(c) Egz™ = (Egz”)*, |z| < 1; 

(d Dno Pof{Xan = 03” = roe 

(e) Eyz = 1 — V1 =z, so that X „>o Po{ Xan = 0)” = —L,; 


1—z2 


(f) if N (k) denotes the number of visits of state k before the first return to 0, i.e., 
before time o4, then EN (k) = 1, for any z € Z \ {0}, k £0. 


Problem 8.8.20. Let X = (Xn„)n>0 and Y = (Y;)n>0 be any two simple symmetric 
random walks on Z¢, d > 1, and let 
R= >) >) 104 =). 
i=0 j=0 

Prove that when d = | the expectation ER, i.e., the expected number of periods 
during which the two random walks meet before time n (taking into account multiple 
visits of the same state), behaves, asymptotically, for large values of n, as c n*/*, for 
some constant c > 0. It is well known—see [75], for example—that in dimensions 
d > 1 one has (for large n) 


cn, d=2; 
ER,, os C yi, g= 3 
c lnn, d= 4; 
C; d>5, 
where the constant c = cą depends on the dimension d. Verify the above 


asymptotics for ER, and compute the constants c = cq. 


Problem 8.8.21. Suppose that B is some finite set inside Z’ and the function 
f = f(x) is defined for x € BU OB, where 0B = {x ¢ B: |x- yl = 
1 for some y € B}. Then suppose that the function f = f(x) is subharmonic in B, 
i.e., Tf(x) > f(x), x € B, where T is the one-step transition operator. Prove the 
following maximum principle: 


sup f(x) = sup f(x). 
x€BUdB x€0B 
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Problem 8.8.22. Prove that every bounded harmonic function on Z” must be 
constant. 


Problem 8.8.23. Prove that all bounded solutions V = V(x) to the problem: 


AV(x)=0 for xeC, and V(x) = g(x) for x € dC, 


where C C Z® is a given domain, and g = g(x) is some (also given) bounded 
function on dC, can be written as 


V(x) = Ex[g(Xzacy) (t (0C) < c0)] + a Px{r (3C) = o}, 


for some a € R, where t(0C) = inf{n > 0: X, € OC}. 


Problem 8.8.24. Prove the following results about the solution to the homogeneous 
Dirichlet problem: find a function V = V(x) that is harmonic in the domain C © 
Z’ (i.e., AV(x) = 0, x € C) and satisfies the boundary condition V(x) = g(x), 
x € OC, for a given function g = g(x) that is defined on ðC as follows: 

(a) if d <2 and the function g = g(x) is bounded, then, in the class of bounded 
functions, the solution is unique and is given by the formula Vac (x) = Ex g(X (acy); 


(b) if d > 3, g = g(x) is bounded and P,{t(0C) < oo} = 1, forall x € C, 
then, in the class of bounded functions, the solution is again unique and is given by 
the formula in (a). 


Problem 8.8.25. Let X = (Xn)n>0 be a simple symmetric random walk on Z’, 
d > 1, and suppose that the domain C C Z? is bounded and its boundary ðC is 
defined as {x € Z¢ : x ¢ C and ||x — y|| = 1 for some y € C}. Prove that the 
Dirichlet problem: 


find a function V = V(x) on C U ðC, such that 
AV(x)= —h(x) for x€C, and V(x)= g(x) for x € ðC, 
where h = h(x), x € C, and g = g(x), x € OC, are given functions, 


has a unique solution, given by the formula 


t(ðC)—1 t(dC)—1 
Vac (x) = Eg (Xoo) Ex | D> AXD Ex} do AXDI |< o. 
k=0 k=0 


Hint. Use the method described in Sect. A.7 in the Appendix and the fact that, 
because of the finiteness of C, one must have P,{t(dC) < co} = 1,x e C 
(comp. with Problem 8.8.11). 


Problem 8.8.26. Consider the simple random walk S = (S,,),>0, defined on Z = 
10, +1, +2,...} by So = Oand S, = & +... + En, n > 1, where &,&,... are 
independent Bernoulli random variables, with P{&; = 1} = P{gé = —1} = 1/2, 
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and let t = inf{n > 0: Sa = 1}. In Problem 7.2.18 it was proposed to show (by 
using martingale methods) that, given any |æ| < 1, one must have 


Ea’ =a ![1—- V1—@?]. 


Derive the last relation from the strong Markov property, by showing first that the 
function g(a) = Eat satisfies the relation g(a) = Sa + tag’ (a). 


Problem 8.8.27. In this problem it is proposed to carry out certain calculations in 
the model developed by T. and P. Ehrenfest, which was meant to explain the absence 
of (the seemingly existent) contradiction between “irreversibility” and “recurrence” 
in Boltzmann’s kinetic theory of heat propagation. 

As is well known, this theory stems from the representation of the molecular 
structure of the matter and the consequent treatment of the heat exchange as a 
diffusion process. It was developed by Boltzmann for the purpose of explaining the 
(mostly phenomenological) theoretical conclusions of thermodynamics, based on 
the hypothesis that the distribution of heat is irreversible and moves toward a thermal 
equilibrium. Although Boltzmann also believed that thermal equilibrium in the 
system prevails and leads to a state that maximizes the entropy, the “stochastic” 
theory that he proposed did not exclude—in theory at least—the possibility that over 
time the system may return to its original thermodynamic disequilibrium, which 
was the basis for criticism of the kinetic theory. (Poincaré noted the possibility 
for “recurrence” in the case of dynamical systems described in terms of measure- 
preserving transformations—see [P §5.1].) 

Boltzmann himself claimed that there was no contradiction between “irreversibil- 
ity” and the physically unobservable “recurrence,” since in a stochastic system the 
return to states of macroscopic non-equilibrium is possible, but occurs after such a 
long period of time that it is practically unobservable. 

From a physical point of view, the model developed by the Ehrenfests’, which 
was formulated in terms of the theory of Markov chains, was quite adequate, as it 
was able to describe the exchange of heat between two bodies that are in contact 
with each other, but are otherwise isolated from their environment. This aspect of 
the model allows for an interesting quantitative analysis of the average time for 
transition from one state to another. 

Let E = {0,1,...,2k}, where “state i” means “there are i molecules in camera 
A” (a detailed description of the model proposed by the Ehrenfests can be found in 
[P §8.8, 3]). Denote by 


tii) = inffn > 0: X, =i} and o(i) =inf{n > 0: X, = i}, 


respectively, the time of the first visit and the time of the first return to state 7, with 
the usual understanding that inf @ = oo. 
Prove that: 


(a) E;o (i) = 2% ne and, in particular, the average recurrence time to the 


null state is given by Ego (0) = 2%; 
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(b) Ext(0) = 3¢ 2°*(1 + O(k)); 
(c) Egt(k) = klnk +k + 0(1). 


(In [8] one can find the following numerical results: if k = 10,000 and the exchange 


of molecules occurs once per second, then the expected time Egt(k) is less than 
29h, whereas E;, (0) is astronomically large: 10° years (!)). 


8.9 The Optimal Stopping Problem for Markov Chains 


Problem 8.9.1. Prove by way of example that for Markov chains with countable 
state space the optimal stopping time may not exist (within the class S)t7°). 


Problem 8.9.2. Verify that the time t,, introduced in the proof of [P §8.9, 
Theorem 2], is a Markov time. 


Problem 8.9.3. Prove that the sequence X = (X1, X,...), which was defined in 
in [P §8.9, 7] in the description of “the marriage problem,” forms a Markov chain. 


Problem 8.9.4. Let X = (Xn)n>0 be some homogeneous Markov chain with 
values in R and with transition function P = P(x; B), x € R, B € A(R). We 
say that the R-valued function f = f(x), x € R, is harmonic (or P-harmonic, or 
harmonic relative to the transition function P), if 


ELADI = f IFO) PE; dy) < 00, Jer 


and 
f®)= J Ire xR. (x) 
R 


If the identity “=” in (*) is replaced by the inequality “>” we say that the function 
f is superharmonic—see also Sect. A.7. 

Prove that if f is a superharmonic function, then, for every x € R, the sequence 
(f(Xn))n>o, with Xo = x, is a supermartingale (relative to the measure P). 


Problem 8.9.5. Prove that the time t, which appears in [P §8.9, (40)], belongs to 
the class M. 


Problem 8.9.6. Analogously to Example 1 in [P §8.9, 6], consider the optimal 
stopping problem 
Sy(x) = sup E,g(X,) 


remy 
and 
s(x) = sup E,g(X,), 


TEMG? 


for all simple random walks from the Examples in [ P §8.8]. 
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Problem 8.9.7. (Controlled Markov chains and optimization.) Let {P(a),a € A} 
be some family of transition probability matrices P (a) = || pi; (a)||, parametrized by 
the collection A of all possible choices for the “control.” The associated phase space 
E = ii, j,...} is assumed to be either finite or countably infinite and any function 
u = u(i), i € E, which takes values in the space A, will be treated as a “possible 
control strategy,” i.e., a prescription for the value of the control parameter in every 
statei € E. 

Given a particular choice for the control u = u(i), i € E, we denote by P” the 
associated transition probability matrix || p;; (u(i )) ||, from which one can obtain (see, 
for example, the Ionescu Tulcea Theorem in [P §2.9]) the respective probability 
distribution, Př, i € E, in the space E°—this is nothing but the probability 
distribution of the Markov chain X = (X,)n>o that starts from state Xo = i and is 
being “steered” by the control u. 

Let C be some domain inside the phase space E, set D = E \ C, and consider 
the functions h = h(i,a),i € C,a € A, and g = g(i, A),i € D,a € A, which, for 
now, will be assumed non-negative. For every (fixed) choice of the control u = u(i), 
i € E, we write h"(i) = A(i, u(i)) and g"(i) = g(i, u(i)). The “gain” associated 
with the control u = u(i), i € E, when the chain is in state j € E is given by 

t(D)—-1 

vj) =E" |e" Kany) <oo)+ J wa], 
k=0 
where t(D) = inf{n > 0: X, € D}. The meaning of the quantity V”(j) should 
be clear: it represents the expected aggregate gains, including the cumulative gains 
h“ and the termination gain g“, collected while the chain remains in the domain C, 
assuming that the initial state is Xo = j and the chain is subjected to the control 
u=u(i),i € E. 

The optimal control problem associated with the gain function V“(j) comes 
down to computing the value function 


v*(j) =supV"“(j), J € E, 
and the optimal control u* = u*(i),i € E, if one exist, with V*(j) = sup, ve (j). 


Prove the following statement, which is known as the “verification theorem:” 
Suppose that 


(i) There is a function V = V(j), j € E, such that 
Vij) = sup | XO Palv + hG a), jec, 
acA FER 
and 


VG) = pg, j € D; 
aE 
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(ii) In the class of admissible controls, one can find a control u* = u*(i),i € E, 
such that for any fixed j both supremums above are achieved with a = u* (j). 


Then the control u* = u*(i),i € ŒE, is optimal: for every admissible control u 
one has 
V“ G) ZV") and V" (G)=VG), J ek. 


Hint. Use the fact that for every admissible control u one must have 


VG) > TVJ) +h"), jf eC, where T“V(j) = E} V(X), 


and 
V(j) > g"(j), jel. 


Then use the fact that with u = u* the above inequalities turn into equalities. 


Problem 8.9.8. (The “disorder” problem.) Consider the Bayesian risk 
R” (t) = P*{t < 0} + cE" (t - 6)", 


which was introduced in Problem 6.7.8. According to that problem, the infimum of 
the quantity R” (rt), taken over the class MGO of all P*-finite (x € [0, 1]) Markov 
times T, is attained at the Markov time 


t* = inf{n > 0: 2, > A}, (*) 


where A is some constant that may depend on c and p. 
Prove that 
(a) The Bayesian risk R7” (t) can be written in the form 


t—1 
R(t) = e (l—2z,) +el(t > 1) Snl ; 
k=0 
(b) In the optimal stopping problem for the Markov chain (z,,)n>0 
tl 
R” = inf E" (1-2) +el(t>1)> mp, 
TEM GO k=0 


the infimum is achieved with the stopping time t* defined in (x). 


Appendix A 

Review of Some Fundamental Concepts 
and Results from Probability Theory 
and Combinatorics 


A.1 Elements of Combinatorics 


In its early stages, the “calculus of probability” was comprised mostly of combina- 
torial methods for counting the (usually finite) number of configurations that lead 
to the realization of certain random events. Even today, these counting techniques 
remain indispensable for the theory of probability—especially for “the elementary 
theory of probability,’ which deals with finite spaces of elementary outcomes. 
In fact, combinatorial methods play a crucial role in many domains of discrete 
mathematics, including graph theory and the theory of algorithms. 

What follows is a brief summary of some basic notions and result from 
combinatorics that are used in the books “Probability” and also in the present 
collection of problems. 

e Let A be some collection of N < œ elements a1,...,ayn (so that |A| = N). 
If all of these elements are distinct, then the collection A can be referred to as a set 
and may be expressed as 

A = {d),...,an}. 


In the above notation the order in which the elements ai,...,ap are written 
is irrelevant. For example, {1,2,3} and {2,3,1} refer to one and the same set 
comprised of the elements {1}, {2} and {3}. 

With each set A = {a),...,ay} one can associate two different types of samples 
(sometimes called sequences) of size n: 


(4i,,---,4i,) and [aj,,...,4i,], 


where ij,...,7, € {1,..., N} and the symbols ai; stand for elements of the set A, 
which may or may not coincide for different values of j. 

The token (a; ,...,a;i,) is used to denote ordered samples, i.e., samples identi- 
fied not only by the collection of its members, but also by the order in which those 
members are listed. 
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The token [a;,,...,a;,] is used to denote unordered samples, i.e., samples 
identified only by the collection of its members, but not by the order in which those 
members are listed. 

For example, the samples (a4, a1, 43,41) and (a1, 41,44, a3), represent one and 
the same collection of elements, but are nevertheless different, because these are 
ordered samples that differ in the order in which their (identical) members are listed. 
At the same time, the samples [a4, a1, a3, a1] and [a,,a,, a4, a3] are two identical 
unordered samples. 

If samples of the form (a;,,...,a;,), or of the form [a;,,...,a;,], are taken from 
the set A by way of “sampling without replacement,” obviously, all elements in the 
sample must be different and, of course, one must have n < N. If samples of the 
form (d;,,...,qi,), or of the form [aj,,...,a;,], are taken from the set A by way of 
“sampling with replacement,” obviously, the sample may contain identical elements. 
Furthermore, in this case the size of the sample, n, could be arbitrarily large. 

A partition of the set A, with |A| = N, is any collection, 2 = {Dj,..., Dn}, 
n < N, of subsets D; C A, 1 <i < n, with D; 4 Ø, Di ND; = Ø fori Æ j, 
and Dı +... + Dn = A. The sets D; are the atoms of the partition 2. 


Counting Various Samples 
from a generic set A = {a1,...,aN}%; 
Combinatorial Numbers and Their Interpretation. 


(a) (N) = N(N —1)...(N —n + 1) — The number of ordered samples (.. .) 
(“number of placements” N ton, 1 < of size n, comprised of elements of 
n<WN) the set A with |A| = N, by way of 

“sampling without replacement;” 


b) Ci = Au (= TO (“number — The number of unordered samples 
of combinations” n of N, binomial [...] of size n, comprised of elements 
coefficients) of the set A, with |A| = N, by way 


of “sampling without replacement;” 


(co) N” — The number of ordered samples (. . .) 
of size n > 1, comprised of elements 
of the set A, with |A| = N, by way 
of “sampling with repetition;” 


(d) Chant — The number of unordered samples 
[...] of size n > 1, comprised of 
elements of the set A, with |A| = N, 
by way of “sampling with repetition.” 


For various combinatorial interpretations of the above numbers, see the problems 
from Sects. 1.1 and 1.2. In particular, according to Problem 1.1.3 the number 
of ordered sequences (...) of length N, that consist of n “ones” and N — n 
“zeroes”, equals Cx, . This result is particularly important in the elementary theory 
of probability, in connection with the binomial distribution. 
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Example A.1.1. Consider the set A = {d1,a2,a3,d4}, in which |A| = 4, and let 
n = 2. Then one has 
(a) (4). = 4(4— 1) = 12. There are 12 ordered samples of size n: 


(41,42), (a1,43), (1,44), (a2,a1), (a2,a3), (2, a4), 


(43,41), (43,42), (43,44), (a4,a1), (a4,a2), (G4, 43). 


(b) C? = = 6. There are 6 unordered samples of size n: 


JA 
[21,42], [a1,a3], [a1, a4], [a2,a3], [a2,a4], [a3, a4]. 
(c) 4? = 16. In addition to the 12 samples listed in (a) one must include the samples 
(41,4), (a2,a2), (a3, a3) and (a4, a4). 


(d) C2 442-1 = =C = a= = 10. In addition to the 6 samples listed in (b) one must 


include also the samples [a1, a1], [a2, a2], [a3, a3] and [a4, a4]. 


Counting Subsets and Partitions 
of a generic set A = {a),...,an}; 


Combinatorial Numbers and Their Interpretation 


e) 2% — The number of all subsets of A (includin 
g 
the empty set Ø and the set A with |A| = 
N). 


(f) Cy = TA — The number of subsets D C A, of size 
l l 0<n <N (|D| =n, |A| = N, with the 
understanding that D = {Ø} whenn = 0 

and C3 = 1). 

(g) Cy(m,...,N-) = Tae (the— The number of partitions 2 = 
“multinomial,” or “polynomial” {D1,...,D,} of the set A with |A| = N 
coefficients, nı +...+n, = N) into r disjoint sets D,,...,D,,r <n, 

with |Di] = m,...,|D,| = m, 
mt+...tn,=WN. 

(b) Dy(Ay,...,An) — The number of “block” partitions of the 
=, Me set A with |A| = N, of the form 

(141... (NDAN (Ah. An)! 
(A; > 0 for alli and 9 = {Di Diz} 


Eii =N) 

; ..jDmi,...,Dnay}, 
where the “block” [Dj1,..., Dia] con- 
sists of A; sets, every one of which has 
i elements (Dil = =i l s k-s Aj); 
if A; = 0, then the respective -block 


is undefined and is not included in the 
partition 2. 
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(i) SX = YO Dn(Ai,...,Aw) — The number of partitions 2 of the set A, 
(the summation is taken over with |A| = N, that consist of exactly n 
all choices of Aj,...,Ay, classes. 
for which ye Ai = a, 
yi iA; = N, and A; > 0 for 
all i) 


The numbers S, 1 < n < N, are known as Stirling numbers of the second kind.! 


Gj) Bn = T Sy — The number of partitions of the set A, with 
|AJ =N. 


The numbers By are known as Bell numbers. 


(Some additional properties of the numbers introduced in (f)—(G) above can be found 
in the problems from Sect. 1.2.) 


Example A.1.2. Consider again the set A = {a1,@2,a3,a4} and let N = |A| = 4 
andn = 2. 


(e) 24 = 16. The 16 sets are given by: 


Ø, {ay}, {ar}, {as}, {aa}, 
{41,42}, {41,43}, {a1,a4}, {2,43}, {a2,a4}, {a3, a4}, 


{@1,42,43}, {41,42,44}, {41,43,44}, {42,43,44}, {41,42,43, a4}. 
(© C? = 6. The 6 sets are given by: 
{a1 a2}, {a1,a3}, {a1,a4}, {42,43}, {42,44}, {a3,a4}. 


(g) Ifr = 2,n; = 1 and m = 3, then C4(1,3) = a = 4. The 4 partitions are 
given by: 


{a} and {a2,a3,a4}, {az} and {a 1,43, a4}, 


{a3} and {a;,d2,a4}, {a4} and {a), a2, a3}. 


(h) Ay = 2, 42 = 1,43 = 0,A4 = 0; DL, iA: = 4, 


'For the Stirling numbers of the first kind, see p. 377 below. 
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4! 


D4(2,1,0,0) = i =6 
al ) = UPEN GPADZITOIO! 


The 6 “block” partitions are given by: 


[Kai}, {a2}] and [{a3}, {a4}, Kai}, {a3}] and [{az}, {a4}, 
[tai}, taa}] and [{a2}, {a3}], Kaz}, tas}]] and [{ar}, {a4}, 
Kaz}, ta4}] and [{ai}, {ash}, Kas), tag] and [{a1}, {a2}. 


(i) S2 = D,(0,2,0,0) + D4(1,0,1,0) = 3+ 4 = 7. The 7 partitions are: {a1} 
and {a2, 43, a4}, 


{az} and {a,,a3,d4}, {a3} and {a,,d2,a4}, {a4} and {a1, a2, a3}, 

{a1, a2} and {a3,a4}, {a,,a3} and {az,a4}, {a1, a4} and {az, a3}. 
For example, analogous calculation shows that: 

Soak SSG: S=], 

Si=1, S?=15, S2=25, S$=10, S?=1, 

Si=1, S?=31, S23=90, S¢=65, S2=15, S2=1. 


G) With N = 4 property (i) implies that 
B4 = Si + S2 + S + Sf =1+7+6+1= 15, 
Bs =1+154+25+104+1=52, 
Bo =1+31+904+ 654+ 15+ 1 = 203. 


The respective 15 partitions (for N = 4) are: 


{a1,42,43,d4}; {41}, {42}, {a3}, {aa}: 

{a,} and {a2,a3,a4}; {a2} and {a), a3, a4}; 

{a3} and {a),d2,a4}; {a4} and {a , dz, a3}; 

{ai do} and {a3,a4};  {a,, a3} and {az, a4}; {ay,a4} {a2, a3}; 


{ai}, {az} and {a3,a4}; {ay}, {a3} and {a2,a4}; {a1}, {a4} and {a2, a3}; 
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{az}, {a3} and {a;,a4}; {a2}, {a4} and {a1, a3}; {a3}, {a4} and {a2, a3}. 


e There is more to combinatorics then the mere counting of all favorable con- 
figurations for various events encountered in the elementary theory of probability. 
For example, combinatorial reasoning is often used to establish identities like the 


following one: 


n =)" SK x nk, l<n<QN, (*) 
k=l 


where S k are the Stirling numbers of the second kind. (Note that S} = SÑ = 1 
and, by the usual convention, S9, = Qand Sý = 0, forn > N.) 

The combinatorial proof of the above identity is based on the idea that both 
sides represent one and the same number of configurations, except that these 
configurations are counted in two different ways. More specifically, let A and B 
denote any two finite sets with |A| = N and |B| = n. Consider a generic function 
of the form y = f(x), which is defined for x € A and takes values in the set B. How 
many such functions can one find? Since one can assign to each of the N possible 
values of x any one of the n possible values y, it is clear that the total number of 
functions from A to B must be n™. 

The total number of functions from A to B can be counted also by considering the 
pre-image f7! (y) = {x : f(x) = y} of any given y € B. With this construction 
in mind, given any subset C C B with |C| = k, for some 1 < k < n, consider the 
collection of all functions y = f(x), for which one can claim that Range( f) = C. 
Since |C | = k, then any function f from this collection defines a partition of A that 
consists k disjoint classes, characterized by the property that f takes one and the 
same value on each class and different values on different classes, i.e., the classes 
in the partition are simply the pre-images under f of the elements of C. As the 
total number of such partitions is S$ k, and for each partition there are (k) = k! 
functions that assign different values from the set C to the classes in the partition, 
the total number of functions from A to B whose range is precisely C, must be 
S% xk!. 

As there are C¥ possible selections of the set C with |C | = k, the total number 
of functions y = f(x), defined for x € A and taking values in B, with |A| = N 
and |B| = n, must be 


DOC} x Sh xk! =Ņ Sh x (n). 
k=1 k=1 


But the number of the elements in the same collection of functions was also found 
to be 7, so that the identity (*) is now established. (Many problems and examples 
of the use of “double counting” and other combinatorial techniques can be found in 
the books [20, 27, 46, 110, 111].) 
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Table A.1 Factorials and their logarithms 


n n! Inn! 


Ds igasosl ats siaiouale coins do ane ace iste oun een locate T 1 0 
De saps saeh, sot pede N E ie ee ume erate 2 0,3010300 
DiS. +aisisd dh E E T 6 0,7781513 
E EEE EEEE teat 120 2,0791812 
MO EET E ER 3 628 800 6,5597630 
IS eniras tanda oeuu needus 1 307 674 368 000 12,1164996 
20 eer ee 2 432 902 008 176 640 000 18,3861246 
ae 15 511210043 330 985 984000000 = 25,1906457 


30 265 252 859 812 191 058 636 308 480 000 000 32,4236601 


Table A.2 Binomial coefficients 


11 1 

21 2 1 

gL 3 3 1 

41 4 6 4 1 

5 1 5 10 10 5 1 

61 6 15 20 15 6 1 

71 7 21 35 35 21 7 1 

8 1 8 28 56 70 56 28 8 1 

9 1 9 36 84 126 126 84 36 9 1 

10 1 10 45 120 210 252 210 120 45 10 1 

11 1 11 55 165 330 462 330 165 165 55 11 1 

12 1 12 66 220 495 792 924 792 495 220 66 12 1 

13 1 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1 

14 1 14 91 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 1 
15 1 15 105 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105 15 1 
nk 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 


The first several values of the quantities n! and C¥ are given in Tables A.1 
and A.2. 


A.2 Basic Probabilistic Structures and Concepts 


The most basic structure, on which essentially any probabilistic or statistical 
analysis is usually carried out is that of a probability space, or, a probabilistic model, 
which as a triplet of the form (see [ P §2.1]) 


(2, F,P), 
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where: 


Q is the space of elementary outcomes w; 
F isao-algebra of subsets of 2; 


P is a probability measure on F, i.e., o-additive function of the set 
A € F, such that 0 < P(A) < 1, P(@) = 0, P(2) = 1. 


e In addition to the notion of o-algebra, which is an essential ingredient in the 
structure of any probability space, sometimes one must work with other systems of 
subsets: algebras, separable o-algebras, monotone classes, z-systems, A-systems, 
z-A-systems, families of cylindrical sets, etc.—see [ P §2.2]. 

e The events (or the sets) A and B are said to be independent, if P(A N B) = 
P(A) x P(B). 

Two systems, Y, and Y, of subsets of F are said to be independent if for any set 
A € G and any set B € % one can claim that A and B are independent. 

The sets A,,..., A, E€ F are said to be independent, if, for every k = 1,...,n 
and 1 <i, <ig <... < ip <n, one has 


P(Aj, MN... An) = P(Aj,)... P(A). 


The independence of the systems %,...,%,, all comprised of sets from F, can be 
defined in a similar fashion. 

e A measurable space is a pair of the form (E, £), where E is a set and & is a 
o-algebra comprised of subsets of E. 

The most common measurable spaces are (see [ P §2.2]): 


(R, A(R)) — the real line R endowed with the Borel o-algebra A(R) 
(often denoted simply by ¥); 


(R”, @(R")) — the space R” = Rx...xR endowed with o-algebra A(R”) = 
AR) 8...8 A(R); 


(R®, A(R®)) — the space R® = Rx...xRx... endowed with the o-algebra 
A(R“), generated by all cylinder sets; 


(R’, A(R" )) — the space RT of all functions that map the (generic) set T 
into the real line R, endowed with the o-algebra BR), 
generated by all cylinder sets in the space RT; 


(€, B(C)) — the space @ of continuous functions (e.g., continuous func- 
tions on [0, 1] or [0, co)) endowed with the o-algebra A(@), 
generated by all open sets for the usual topology of con- 
vergence on compacts (or, which amounts to the same, 
generated by all cylinder sets); 


A.2 Basic Probabilistic Structures and Concepts 367 


(D, B(D)) — the space D of right-continuous functions with left limits 
(e.g, function on [0,1] that are right-continuous at any 
t < 1 and have left limits at any t > 0), endowed 
with the o-algebra A(D), generated by all open sets for 
the Skorokhod’s metric (or, which amounts to the same, 
generated by all cylinder sets). 


e A random variable is any function X=X(q@) which is defined on some 
measurable space (2, F), takes values in (R, A(R)), and is -measurable, in the 
sense that 

tw: X(w)€ BJE F 


for any Borel set B € A(R). 
The simplest, and at the same time very important, example of a random variable 
is the indicator, X(w) = 14 (œ), of a generic set A € ¥, which is given by 


l, wEA 


T4(@) bores 

A random element is any #/&-measurable map X = X(w) from Q into E 
(i.e., {w : X(w) € B} € F for any B € &), where (2, F) and (E, £) are two 
measurable spaces. 

A n-dimensional random vector (X\(@),..., Xn(@)) is simply an ordered list of 
random variables X;(@),..., X,(@). 

A random sequence, or, equivalently, a random process in discrete time, X = 
(X, (@))n>1, is simply a sequence of random variables X; (w), X2(@),... 

A random process, X = (X;(@))rer, on the time interval T C R is simply a 
collection of random variables parameterized by the set T: X;(@), ft E€ T. 

e A distribution function, F = F(x), on (R, A(R)), is any A(R)-measurable 
function on R which has the following properties: 


1. F(x) is non-decreasing; 
2. limy- F(x) = 0 and limy-++400 F(x) = 1; 
3. F(x) is right-continuous and admits left limits at any point x € R. 


If X = X(w) is a random variable defined on the probability space (2, ¥,P), 
then the probability measure Py on (R, A(R)), given by 


Py(B) = Plo : X(w) € B}, 


is known as the probability distribution of the random variable X = X(q). It is 
easy to see that the function Fy(x) = Py((—oo, x]) is a distribution function 
on (R, A(R)). This function is known as the distribution function of the random 
variable X = X (œ). 

If X = (X;(@))rer is a random process then the probability distributions on R”, 
for various choices of n > | andt; <... < tn, ti € T, given by 


368 A Review of Some Fundamental Concepts and Results from Probability Theory... 


Pa n(B) = Plow : (Xa (œ), ..., Xn (@)) € B}, Be A(R"), 


are known as the finite-dimensional distributions of the random process X. The 
associated distribution functions 


Fat X1. Xn) = Plw: Xa (@) < x1,..., X (@) < Xn} 


are known as finite-dimensional distribution functions of the random process X. 

° If the reference measure of choice on (R, A(R)) is the Lebesgue measure à = 
A(dx), then the “Lebesgue decomposition” (see [P §3.9, (29)] or [P §7.6, (3)]) 
leads to the following result: any distribution function, F = F(x), on (R, A(R)) 
can be decomposed into the sum 


F(x) = a Fac (X) + bFsing(x) ; 
where the constants a > 0 and b > 0 are chosen so that a + b = 1 and 


Fiyc(x) is some absolutely continuous distribution function on R with 
(Borel-measurable) density f = f(y), ie, f(y) = 9, 
co 


Í £0) Ady) = Land Fre) = ff) Ady), x € R; 


Fying(x) is some singular distribution function on (R, A(R)), in the 
sense that the respective probability law, Psing, on (R, A(R)) is 
singular with respect to the Lebesgue measure A (Pying L A). 


The singular function Fsing(x) can be further decomposed into the sum 
Fying(X) =d- Fa-sing(X) Fes Fo-sing (xX) > 


in which the constants d > 0 and c > O are chosen so that d + ¢ = 1, Fa-sing() is 
a discrete distribution function with the property that the support of the associated 
probability measure, Py-sing, is some set inside R that is at most countable, and 
F-sing(X) is a continuous distribution function characterized by the property that 
the support of the associated probability measure, P.-sing, is some uncountable set 
inside R which is negligible for the Lebesgue measure À. (The canonical example 
of such a function is the Cantor function Fy-sing(x)—see [ P §2.3, 1].) 
Recall that the support of any measure u on (R, A(R)) is defined as the set 


supp(u) = {x € R : ply: |y — x| < r} > 0, Yr > 0}. 


As a direct application of the above decompositions, one arrives at the canonical 
decomposition (see Problem 2.3.18) of a generic distribution function F = F(x) 
on (R, A(R)): 


F = a, Fa + QF + 013 Fse, 
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where the constants a; > 0, a2 > 0 and a3 > O are such that aj + a2 + a3 = 
1, and Fy (= Fa-sing), Fabe and Fse (= Fe-sing) are distribution functions on 
(R, A(R)), which are, respectively, discrete, absolutely continuous, and continuous 


and singular. 


A.3 Analytical Methods and Tools of Probability Theory 


e An important characteristic of any random variable X = X(q), defined on some 
probability space (2, .F , P), is its expected value (or simply “expectation”) EX. 

If X = X(q) is a non-negative random variable, then its expected value EX 
is defined as the Lebesgue integral of the function œ ~> X(w) with respect to the 
measure P: 


EX = | X(w)P(da). 
J 


If X = X(%) is an arbitrary (i.e., not necessarily non-negative) random variable, 
then one can write X = X+ — X`, where Xt = max(X,0) and X~ = 
—min(X, 0), and the expected value EX is said to exist, or to be well defined, if at 
least one of the expectations EX + and EX7~ is finite (i.e, min(EX +, EXT) < 00), 
in which case EX is defined as 


EX =EX*t—ExX-. 


The expectation EX is said to be finite (equivalently, X is said to be integrable), if 
EX+t < œ and EXT < œ, which is equivalent to the requirement E|X| < oo, 
since |X| = X* + X` (see [P §2.6]). 

e An important analytical “trick” in probability theory is the passage to the 
limit under the Lebesgue integral. This operation is justified by the monotone 
convergence theorem, Fatou’s lemma, and Lebesgue’s dominated convergence 
theorem. The following tools are fundamental in probability theory: the concept of 
uniform integrability, the fundamental inequalities (Chebyshev, Cauchy-Schwarz, 
Jensen, Lyapunov, Holder, Minkowski and others), the Radon-Nikodym theorem, 
Fubini’s theorem and the “change of variables” theorem for the Lebesgue integral 
(see [P §2.6]). 

e The dispersion of the random variable X = X(q) is defined as 


DX =E(X —EX)?. 


The quantity o = +./DX is known as standard (linear) deviation of the random 
variable X (from the mean value EX). 
If it exists, the covariance of any given pair of random variables, (X,Y), is 
defined as 
cov(X, Y) = E(X —EY)(Y — EY). 
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If X and Y are random variables with 0 < DX < œ and 0 < DY < o, then 
the quantity 


cov(X, Y) 
~v DX DY 
is known as the correlation coefficient of X and Y. 
For a given random variable X and an integer n, if it exists, the expectation EX” 
is called the moment of order n, or the n‘ h moment, of X. The quantity E(X), = 


EX(X —1)...(X —n + 1) is called factorial moment of order n. 
e If F = F(x) is any distribution function, then the function 


p(X, ¥) = 


g(t) = ferar (= [costs arc) +i f sine arts)). tek, 
R R R 


is the characteristic function of F. In particular, if X is a random variable and Fy 
is its distribution function, then the characteristic function of Fy, namely, 


x(t) = fer dFy(x) = EO, ter, 
R 


is also called characteristic function of the random variable X = X(q@) (see 
[P §2.12)). 

e Given any non-negative random variable X with distribution function Fy = 
Fy (x), the Laplace transform of X—or, equivalently, of Fy—is defined as the 
function 


Fx) = je dF x(x) =Ee"**, A>0. 
0 


Tables of the most commonly used discrete probability distributions and distri- 
butions with densities can be found in [ P §2.3]. 

e The method of generating functions is particularly useful in the study of discrete 
random variables. This method is widely used also in other areas of mathematics as 
a convenient tool for studying some special numerical sequences, whose structure 
is not immediately obvious. 

In probability theory, the generating function, G(s), of the discrete random 
variable X, which takes the values 0,1,2,... with probabilities po, pi, p2,... 
(pk = 0, YR Pk = 1), is given by 


co 


G(s) = Es* (= ms"). Is] <1. 


k=0 
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The distribution of the random variable X, namely, (p;)x>0, is uniquely deter- 
mined from the generating function G(s) through the formula 


Gh) (0) 
pH PX ah 
If the components of the discrete vector-valued random variable X = (X),..., 


Xa), take values in the set N = {0,1,2,...}, then its generating function, G(s), 
with s = (s1,...,5q), is defined by: 


X X, k k 
G(s1,...,8a) = Esy'...577 = Picky eSa S 


where Prhka = P{X1 =ki,...,Xa = kah Isk] < 1,k =1,...,d. 
If the variables X;,..., Xq are independent, then 


G(s1,...,5¢) = Gi (S1)... Ga (Sa), 


where Gx (sk) = Es“, k = l] oret: 

The above definition of the generating function G(s) assumes that the random 
variable X is non-negative and takes values in the set N = {0,1,2,...}. For various 
reasons it is useful to expand this construction also for the case where X takes 
positive and negative values, i.e., P{X = k} = pk, fork = 0,1, Æ2,..., 
and ) a ake pk = 1, without supposing that all p_1, p-2,... must vanish. The 
generating function, G(s), of any such random variable X is given by the formula 


CO=—Er = X` p, 


k=—00 


for those s for which E|s*| < oo. 

Typically, generating functions of the above type are used when working with 
the difference, X = X, — X2, of two random variables, X; and X2, that take values 
in the set N = {0,1,2,...}. For example, if X, and X> are independent and have 
generating functions, respectively, Gy, (s) and G y, (s), then 


1 
Gx (s) = Gy,-x,(s) = Gx, (s) Gy, (<) . 


In particular, if X; is distributed with Poisson law of parameter A;, i = 1,2, then 
Gy,(s) = e7% 0— and the generating function Gy (s) of X = X; — X3 is given by 


Gy(s) = e70 tAds s22 — epithe) y evM). 


where t = sVJAi/Ao. 
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It is well known from analysis that, for every fixed x € R, one can write 


[0,6] 
P= Y FROG: 


k=—00 
where J/;,(2x) is the modified Bessel function of the first kind of order k, namely 


oo 2r 
— yk x _ 
Ii.(2x) = x er ray k =0, +1, #2,... 


(Alternatively, one can say that for every fixed x € R, the generating function of the 
sequence (Ix (2x))k=o.+1,... is e*¢+0),) 

Thus, we find that the probability distribution of the random variable X = X; — 
X can be expressed as 


À k/2 
PIX =k} = eont (2E) Rey ae): 


2 


where k = 0,+1,+2,.... 

For other examples related to the calculation of some concrete generating 
functions, and for various applications of the method of generating functions, see, 
Problems 2.6.28, 2.6.32, 7.2.18, and 8.8.19. 

e As was noted earlier, the method of generating functions plays a significant role 
in several important domains of mathematics; in particular, in discrete mathematics 
and combinatorics. 

In fact, it was the method of generating functions that brought to light the 
algebraic methods for solving various combinatorial problems, thus giving rise to a 
new direction in combinatorics, called algebraic combinatorics. 

In general, many important combinatorial properties, operations and relations 
can be interpreted in such a way that they become algebraic in nature. 

As an illustration of the use of the algebraic properties of certain generating 
functions for the purpose of a concrete combinatorial calculation, consider the 
following lottery-problem. The tickets in a particular lottery are identified by the 
six-digit numbers from 000000 to 999999. Suppose that one must compute the 
probability that a randomly chosen ticket has a number in which the sum of the first 
three digits equals the sum of the last three digits. Clearly, this is a combinatorial 
problem, which comes down to computing the respective number of favorable 
configurations. One may try to compute this number by brute force, i.e., by counting 
those configurations one-by-one. However, as we are about to see, this number is 
quite large (55, 252, to be precise) and straight counting would be rather impractical. 

In contrast, the method of generating functions allows one to solve fairly quickly 
a more general problem: calculate the probability for a randomly selected ticket in 
a lottery with 10?” tickets, identified with the 2n-digit numbers from 0 to 107” — 1, 
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to have a number in which the sum of the first n digits equals the sum of the last 
n digits. Assuming that n = 3, let X = (Xj,..., X6) be a vector of independent 
random variables, chosen so that pe = P{X; = k} = 1/10, fork = 0,1,...,9, 
and consider the generating function 


Gy) 5 k as: Fa F3’) Lia 
x,(s) = ps = — s+t...ts7)= — . 
= 10 10 t= 


Because of the independence of the random variables X;, 1 < i < 6, one can write 


1 (1-8)? 
Gx, +x2+x(8) = Gx, (s) x Gx, (s) x Gx, (s) = Tal = ) 


Analogous expression can be written also for Gy,+5.+x,(S). 
Now consider the random variable Y = (X; + X2 + X3) — (X4 + X5 + X6). 
Clearly, due to the independence, one must have 


1 1 £7 
Gy (s) = Gxi +x2+x%; (5) Grutxsxe( >) = daa) 


In addition, the coefficient qo (for the term s°) in the expansion 


Gy(s) = Do qus" 
k 


is nothing but the probability P{Y = 0}, which is precisely the probability that the 
sum of the first three digits on the randomly selected ticket equals the sum of the last 
three digits. After a somewhat involved but otherwise straight-forward calculation, 


1—s!0 


6 
E ) into power series (for 


from the expansions of (1 — s!0)6, (1 — s)~® and +( 
the powers sé k = 0,41, +2,.. .) one finds that 


55252 


= = 0.05525 
qo 106 


(see also Problem 2.6.79). 
In general, the generating function associated with an arbitrary numerical 
sequence a = (d,,),>0 is defined as the (formal) power series 


Ga(x) = ao tax +aox? +... >, x eR. 
If the above series has a non-trivial radius of convergence, then it would define a true 


(i.e., not just formal) function (on the respective interval of convergence). According 
to the general theory of generating functions, the function G(x) is nothing but 
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a special “encryption” of the sequence a = (d,),>0, in that there is a bijective 
correspondence 
(an) <> Ga(x). 


It is a trivial matter to check that if (b,,) <> G, (x) and c is some constant, then 
(an + cbn) o Ga(x) F cG, (x). 


Perhaps the most important features of the bijection “<>” is the relation 


(Ean) <> Ga lx) x Gp(x), 
i=0 n>0 


which simply says that under the bijection “<>” the convolution of the sequences 
a = (an)n>0 and b = (bn)n>0 corresponds to the multiplication of their respective 
generating functions. It is not hard to see that the formal operations introduced above 
(addition, multiplication by scalars and multiplication between formal series) posses 
the associativity, commutativity and distributivity properties, so that the space of all 
formal series can be treated as an algebraic structure—for more details, see [27, 46, 
110,111). 

In addition to the power series G,(x), constructed from the sequence (a,,), one 
can define the exponential generating function 


x" 
E(x) = Xa Pr 
n: 


n>0 


the series again being understood as a formal series. Just as in the case of generating 
functions, one has the one-to-one correspondence 


(an) <> Ea(x) 
and the following properties hold 


(an + chy) > E,(x)(cEp(x)), 


(Do Ciaid,-1) © EE. 


i=0 


Now we turn to some examples. If the sequence (d,,),>0 is chosen so that a, = 1, 


n > O, then 
le) 1 
Ga = p“ = —_—_, <1 
N=} (= HI<1) 
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and one has the formula 
fore) N fore) 
[G0] = 2] E SO Chani” > 
n=0 n=0 


which comes as a result of the following argument. 
What is the coefficient for x” in the formal expansion of (1 + x + x? +...)%? 
Since 


(ete + Saar er +. Jd eee + Mle +, 


it is clear that if one extracts from the first factor x”!, extracts from the second factor 


x’?, ..., and, finally, extracts from the N th factor the term x”, then one would 
end up with the term x”!x”?...x"”N = x”. The total number of all such choices 
(ni, n2,... ny), Withn; +n2 +... +ny =n andn; > 0, is simply the number of 


all non-negative integer-valued solutions to the equation n; + n2 + ... + ny =n, 
which, according to Problem 1.1.6, is precisely Cy, ,_- 

This shows that the generating function of the sequence (Cy, ,,,_;)n>0 is simply 
(1 —x)~%, |x| < 1. In particular, the following identity must hold: 


5 —n 
2 CN +n hi 
n=0 


Furthermore, the generating function of the sequence (Cj, )n>0, with the under- 
standing that Ci = 0,n > N, is nothing but (1 + x) si in other words, 


N 
(d+ x) = > Cox” 


n=0 


The proof of the last relation is analogous to the proof of the formula for the 
generating function of the sequence (Ch pn—1)n20. 
Consider the identity 


+a +x" =0+%)"™, 


and observe that after expanding both sides in the powers x* and comparing the 
respective terms one finds that 


N 
napa 
ChCh = Cham: 
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The last identity is known as “Vandermonde’s convolution,” or the hypergeometric 
identity—see also Problem 1.2.2. Its derivation is a good illustration of the use of 
the method of generating functions for deriving various combinatorial identities. 

To conclude the discussion of the topic “generating functions,’ we now turn to 
the Stirling numbers of the second kind, S%,, and the Bell numbers, By, which were 
introduced earlier. Recall that SX, gives the number of all partitions 2 of a set A 
with N elements, such that 2 consists of exactly n classes. Recall also that By = 
~~ SX, gives the number of all possible partitions of the set A with |A| = N. 

In Sect. A.1 we established the formula n” = )~Z_, S% (n)k by using combina- 
torial considerations. (Recall that S} = SY = 1, S9, = 0 and S}, = O forn > N.) 
It is easy to see from this formula that for any N > 1 the polynomial 


N 
Py (x) =x" — 0 Sh na, xeR, 


n=1 


has roots x = 1,..., N. Since x = 0 is also a root, it follows that Py (x) = 0. 
Consequently, for any N > 1 and x € R one has 


N 
x” = SoS (X)n ë 


n=l 


If we set So = 1, (x)o = | and Sa = 0 for N > 1, one finds that 


N 
x” = OSH On 


n=0 


forall N = 0,1,2,...andallx € R. 
With the above relations in mind, one can write 


(xs 7) n=O (rs o) 


n>0 ~`N>0 N>0 ~ \n>0 


N 
= 5 Qx) =e* = (GO = ad + (e” = 1))* = > L (e? = 1)" (X)n ; 


N>=0 ` n>0 


due to the Taylor expansion (1 + z)* = 0,9 a (x)n. By comparing the left and 
right sides of the above chain of identities, one finds that, for every n > 0, the 
exponential generating function for the Stirling numbers of the second kind, S}, 
N > 0, is given by the formula 
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n y n 
Do Sig = = Le =i"; n>0. 
N>0 
(with the convention S = 1, S9, = 0 for N > 1 and S}, = 0 for N <n). 
In much the same way one can obtain the generating function of the sequence 


(SÀ )n>0: 


(x) 


i-n m 
x! x 
ve = Em, 
i! ! 
i=0 m>0 
Furthermore, the formula m” = DS N (M)n yields 
n>0 
xm 
e~ ) SHX” = J SHX” e = J si( > (m)n =) 
n>0 n>0 n>0 m>0 mM: 
mNxm 
(yao). 
! m! 
m>o n>0 m>0 


which gives (**). 
With x = 1 (**) gives Dobinski’s formula for the Bell numbers: 


The definition of the Stirling numbers of the second kind, S%,, was based on the 
combinatorial interpretation of these quantities as the total number of partitions of 
a set A that has N elements into n disjoint classes. Then we showed that xY = 
Ea Si, (x)n, for every N > 0. 

The algebraic Stirling numbers of the first kind, s4,,0 < n < N, can be defined 
by the relation 


N 
(x)y = ye (*) 


n=0 


The combinatorial interpretation of the numbers sẹ can be explained as follows. Let 
mx = (T1,..., y ) be any permutation of the numbers (1,..., N) and let cġ denote 
the number of permutations with exactly n cycles. (For exaniples the pennutation 


(5 TẸ £ 3) has two cycles.) One can then show that 
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n — ah-l n 
cy = Cy_y + (N — 1)chi 


(with co = 1), and conclude that 


N 
X chx" =x(x+1)...@+N-D. 


n=0 


By comparing the above generating function for the sequence éh chs sea cN with 


the generating function in (*), for the sequence of Stirling numbers of the first kind, 


0 1 N 
SNo Sy- - -> Sy, One finds that 


cy = Gprs . 


This shows that the Stirling numbers of the first kind sẹ coincide up to their sign 
with the number of permutations of the set (1,..., N), that have precisely n cycles. 

The generating function G(s) = Es*, |s| < 1, associated with a discrete random 
variable X that takes values in the set N = {0,1,2,...} with probabilities pp = 
P{X = k}, k € N, can be written as 


G6) =Y p , 
k=0 


and therefore can be identified with the generating function of the numerical 
sequence (Px)k>0- 

Closely related to the notion of generating function is the notion of moment 
generating function (see Problem 2.6.32). The moment generating function of the 
random variable X is defined as 


M(s) = Ee”. 
Notice that if X > 0 (a.e.) the expectation Ee** would be well defined for —1 < 


s < 0. Assuming that all moments m™® = Ex* , k > 1, are finite, the moment 
generating function M(s) can be expanded into the (formal) series 


M(s) = D gO 3 
k=0 i 


which is nothing but the exponential generating function for the sequence (m);5. 
As was noted earlier, in addition to the usual moments m“ = EX“, in probability 
theory it is often useful to work with the factorial moments 


(m) = E(X); = EX(X-1)...(X -k + 1) 
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and the binomial moments 


(Xk _ (nk 
mE Ta 
(the term “binomial” is justified by the relation bœ) = EC i where C r = 


(X)k/k). 
The sequence of factorial moments ((m)x)x>o and the sequence of binomial 
moments (b(x))x>0 give rise, respectively, to the exponential generating function 


(M)(s) = Som 
k=0 
and the generating function 
[0,0] 
B(s) = > bas’. 
k=0 


Clearly, one must have 
M(s) = G(e*) and (M)(s) = Bis) = G(s + 1). 


It is useful to point out that the following two identities, established earlier, in which 
Si and sh, stand for the Stirling numbers, respectively, of the second and the first 
kind, 


x= > Sh (Yn, (Dy = D syx”, 


n=0 n=0 


entail the following connection between the moments m™ = EX” and the factorial 
moments (m), = E(X),,,n > 0: 


N N 
m= SH Oma, (dw = Dosh. 


n=0 n=0 


e It is useful to notice that many special sequences in mathematics (e.g., 
Bernoulli, Euler, etc.) and special polynomials (Bernoulli, Euler, Hermite, Appell, 
etc.) are defined in terms of generating functions. 

(a) The Bernoulli numbers, bo, b,,b2,..., and the Bernoulli polynomials, 
Bo(x), Bi(x), Bo(x),..., are defined through the respective exponential generating 
functions as: 


KY Yo Š d gers 5. By — 
= an = X= 
e5S— 1 "nl e—] me nl 


n=0 n=0 
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(The first several Bernoulli numbers are: bọ = 5 bi = —5 pba = & gr ba = -5 
bs = 5 bs = -5 box) = 1, Bi(x) = x — $, B(x) = x? — x % t, B3(x) = 
eS $x? + t x—see [109], for example.) All di numbered Bernoulli numbers 
(except for bı = —3) are equal to zero. What follows is a list of some key properties 


and relations: 
1. by = X2 o Chbyn-n, N = 2,3,...5 
2. all numbers by are rational; 
3. Bn (0) = by, By(1) = (-1)"by, N = 0; 
4. By(x) = F3 o Chbax*®™, N > 1; 
5. BY (x) = NBy-\(x), N > 1. 


(b) The Euler numbers, eo, €1,€2,..., and the Euler polynomials E(x), E\(x), 
Fo(x),..., are also defined through the exponential generating functions as: 


Co 
2e° s” 
esa] = y en and = = AOP 
n! 
Ga n=0 n=0 
; 2e 
Since a= a the ee generating function for the sequence of Euler 
numbers eo, €1, €2,... is simply zi 


The above definitions imply that: 
ley = 2" En(5), N > 0; 
2. En(x) = Do CH Enh (x — t 
3. EY (x) = NEn- ey N >l; 
4. all odd-numbered Euler numbers are equal to zero, while even-numbered 
Euler numbers are integers. 
The first several Euler numbers can be computed as: eg = 1, e2 = —1, e4 = 5, 
es = —61, eg = 1,385—see [109]. 
(c) The Hermite polynomials are defined somewhat differently in analysis and 
probability theory. 
The Hermite polynomials, H, (x), n > 0, of the type commonly used in analysis, 
are defined as 
ee A) 


w(x) | 


where y(x) = ae . The respective exponential generating function is given by 


ae N > 0: 


H, (x) = (=1) 


Yn) =e" se R, xR. 


n=0 


The Hermite polynomials, He,(x), n > 0, of the type commonly used in 
probability theory, are defined as: 


D" g(x) 


He, (x) = (—1)” es 


n>0, 


’ a 
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where g(x) = ed 2 is the density of the standard distribution “~ (0, 1). (Note 
that in [P §2.11]—and, commonly, in probability theory—the above polynomials 
He, (x) are denoted by H,,(x).) The exponential generating function associated with 
the sequence He, (x), n > 0, is given by: 


= s” 2 
do Hen (x) — =e, seR,xeER. 
n! 


n=0 


One can easily verify the relation 
He, (x) = 2” H, (271x). 


The first several Hermite polynomials can be computed as: 


A(x) = 1, Heo(x) = 1, 
A(x) = 2x, He; (x) = x, 
Ho(x) = 4x? — 2, He2(x) = x? — 1, 


H3(x) = 8x? — 12x, He3(x) = x? — 3x. 


A more general version of the Hermite polynomials, written as He, (x,t), n > 0, 
x € R,t € R+, can be defined through the relation 


oo gt 2 

X Hen (x, 1) == eT! seR,xeR. 
n! 

n=0 


The polynomials He, (x,t), n > 0, play an important role in the study of the 
Brownian motion, due to the following property: if B = (B;);>o0 is any standard 
Brownian motion, then the following processes can be claimed to be martingales 
relative to the filtration of B = (B;);>0: 


i 


s 


(He, (B;,t))i>+0, for any n>0, and (em) , for any seER. 
t>0 


Note that in the literature the polynomials He, (x, t) are usually written as H, (x, t), 
the exact meaning being made clear from the context. 
(d) Suppose that X is some random variable and the associated generating 
function, 
G(s) = Ee™ , 


is finite for all |s| < A, for some A > 0. 
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We now define the function 


A(s,x) = xeER, |sl<d. 


e** 
G(s)’ 
In actuarial and financial mathematics the map x ~> om 
(of the random variable X )—see [P §7.11]. The function A(s, x) gives rise to the 
Appell polynomials (also known as Sheffer polynomials) Qo(x), Qi(x),... through 
the expansion 


is called Escher’s transform 


o0 k 
A(s,x) = J) Ox) 5. 
k=0 : 


In other words, A(s, x) = oy is simply written as the generating function of the 


sequence of polynomials (Qx(x))x>0. 
The generating function of a random variables X that is uniformly distributed in 
the interval [0, 1] is 


e-l 
G(s) = Ee = —— 
Consequently, in this special case one has 
seer 
A(s,x) = 
e= l 


and the Appell polynomials Q(x) are nothing but the Bernoulli polynomials 
B(x), considered earlier. 

If X is a Bernoulli random variable with P{X = 1} = P{X = 0} = 1/2, then 
its generating function is 


e+1 
G(s) = Ee = 2 
and, consequently, 
Als, x) = = 
s, x) = ——., 
e+1 


and that in this case the Appell polynomials coincide with the Euler polynomials. 
A standard normal (⁄ (0, 1)) random variable X has generating function 


ese", 
and it is easy to check that in this case one has 
A(s, x) = e872, 


and that the Appell polynomials Q(x) coincide with the Hermite polynomials 
Hex (x). 
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Next, let x1, #2,... denote the cumulants (a.k.a. semi-invariants) of the random 
variable X. The following relations are easy to verify: 


Qo(x) = 1, 
Qix) =x- x, 
Q(x) = (x — x1}? — %2, 


Q(x) = (x — x)? — 3x (x — x1) — x3. 


In the special case where X ~ M (0,1), the cumulants are x; = 0, x. = 1, and 
x3 = X4 =... = 0. As a result, one can write: 


Qo(x) = Heo(x) = 1, 
Q1(x) = He; (x) = x, 
Q2(x) = He2(x) = x? — 1, 
03(x) = He3(x) = x? — 3x. 


Notice that in order to claim that the polynomials Q(x), k = 1,...,n, are 
uniquely defined it is enough to require that E|X |” < oo. Furthermore, one has 
(with the understanding that Oo(x) = 1): 


Qœ) = kOx-1(x), 1<k<n. 


The above identities are known as the Appell relations. 

e Given any non-negative random variable X, defined on (2,.%,P), and given 
any o-sub-algebra Y C F, the conditional expectation of X relative to Y is any 
non-negative (not necessarily finite, i.e., with values in the extended real line R) 
random variable E(X |9) = E(X | Y)(w) that shares the following two properties 

1. E(X |F) is Y-measurable, 

2. For every set A € Y one has: 


E[X/4] = E[E(X |Y)JZa]. 
For a general (i.e., not necessarily non-negative) random variable X (= Xt — X7) 
the conditional expectation of X relative to the o-sub-algebra Y C F is considered 


to be well defined if one has (P-a. e.) 


min[E(Xt |Y)(@), E(X~ |¥Y)(@)] < oo, 
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in which case one can write 
E(X |¥Y)(@) = E(XT|Y)(@) — E(X7 |¥Y)(@). 


If X(w) = I4(q), i.e., if X is the indicator of the set A € F, the conditional 
expectation EJU4|¥) = E(U/4|¥)(@) is usually written as P(A|%), or as 
P(A | ¥)(@), and is called the conditional probability of A relative to the o-algebra 
GCF. 

If the o-algebra Y is generated by the random element Y = Y(w) (ie., Y = 
Gy = o (Y )), the quantities E(X | Y) and P(A | ¥y) are usually written as E(X | Y) 
and P(A | Y) and are referred to, respectively, as the conditional expectation of X 
given Y and the conditional probability of the event A given Y . (See [ P §2.7].) 

e Just as in mathematical analysis one deals with various types of convergence, in 
probability theory, too, one deals with various types of convergence for sequence of 


P 
random variables: convergence in probability (X,,— X); convergence almost surely 
d 
or almost everywhere (X,, > X (P-a.e.)); convergence in distribution (X, > X, 
law w 
or X, — X, or Law(X,,) —> Law(X), or Law(X,,) > Law(X)); L?-convergence, 


p > 0, (X, = X); point-wise convergence (X,(@) > X(@), @ € £2). (See 
[P §2.10].) 

e In addition to the various types of convergence of sequences of random 
variables, in probability theory one also deals with convergence of probability 
measures and convergence of probability distributions and their characteristics. 

One of the most important types of convergence of probability measures is the 


weak convergence P, = P, for a given sequence of probability measures Pa, n > 
1, and a probability measure P, defined on various metric spaces, including the 
spaces R”, R, C and D that were introduced earlier. 

Many classical results from probability theory (e.g., the central limit theorem, 
Poisson theorem, convergence to infinitely divisible distributions, etc.), are es- 
sentially statements about weak convergence of certain sequences of probability 
measures—see [ P Chap. 3]. 

e Most fundamental results in probability theory—e.g., the zero-one law, the 
strong law of large numbers, the law of the iterated logarithm—are concerned 
exclusively with properties that hold “with Probability 1” (or “almost surely”). A 
particularly interesting and useful result is contained in the Borel—Cantelli lemma: 


Let A1, A2, . . . be any sequence of events and let {Ap i. o} (= lim, A, = (ie ae Ax) 
stand for the set of those w € 9 which belong to infinitely many events from the sequence 
Aj, Ao, Fid Then 
(a) EL P(A,,) < 00 implies that P{4A, i. 0.} = 0; 
(b) If the events A1, A2,... are independent, then >> 
P{A,, i.o} = 1. 


(See [ P Chap. 4] for more details.) 


co 
n=l 


P(A,) = œœ implies that 


A.4 Stationary (in Strict Sense) Random Sequences 385 
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e The random sequence X = (Xj, X2,...), defined on the probability space 
(2, F ,P) is said to be stationary in strict sense, if its distribution law, Law(X), 
(or, equivalently, its probability distribution, Py ) coincides with the distribution law, 
Law(0; X), of the “shifted” sequence 0k X = (X41, Xk+2, .. -), for any k > 1. 

It is convenient to study the probabilistic properties of such sequences (as is done 
[P Chap. 5]) by using the notions, ideas and methods of the theory of dynamical 
systems. 

e The main object of study in dynamical systems theory are the (measurable) 
measure-preserving transformations of a given configuration space. 

The map T: 2 — & is said to be measurable if, for any given A € F, the 
set T-'A = {w : Tw € A} belongs to F, or T"'A € F for short. The map 
T: 92 — 2 is said to be a measure preserving transformation (of the configuration 
space §2) if it is measurable (for F) and 


P(T~'A) = P(A), forevery Ac F 


The intrinsic connection between “stationary in strict sense random sequences” and 
“measure-preserving transformations” can be explained as follows. 

Let T be any measure-preserving transformation and let X¥; = X,(@) be any 
random variable on 2 (automatically measurable for F). Given any n > 1, define 
X,(w) = X\(T"|!w), where T”! = ToT o---oT ((n—1)-times) is the (n — 1)* 
power of T (as a transformation of 92). The sequence X = (X1, X2,...) is easily 
seen to be stationary in strict sense. 

The converse statement can also be made, if one is allowed to reconstruct the 
probability space. Specifically, if X = (X1, X2,...) is any stationary in strict sense 
random sequence (defined on some probability space (§2, ¥, P)), then it is possible 
to produce a probability space (B, F , P), on which one can construct a measure- 
preserving (for P) transformation T: 2 —> Q and a random variable X; = X\(@), 
so that Law(X) = Law(X), where X = (X1(@), X (TO), skk 

The main results of [P Chap. 5] are concerned with the fundamental properties 
of certain measure-preserving transformations, such as recurrence (“Poincaré re- 
currence theorem”), ergodicity and mixing. The key result in that chapter is the 
Birkhoff—Khinchin theorem, one invariant of which (that covers both measure- 
preserving transformations and stationary in strict sense random sequences) can be 
stated as follows: 


(a) Let T be some measure-preserving ergodic transformation on (2, .%,P) and 
let £ = E(w) be any random variable on 2 with E|&| < oo. Then 


n=1 


lim — 7 LET) = Eé (P-a.e.). 
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(b) Let X = (X1, X2,...) be any stationary in strict sense ergodic sequence of 
random variables on (2, F , P), for which E|X,| < oo. Then 


n=l 


1 
lm- `X = EX]. 
q2 k(@) 1 


A.5 Stationary (in Broad Sense) Random Sequences 


From the point of view of both theory and practice, in the study of random sequences 
of the form X = (X,,), it is important to allow the random variables X„ to take 
complex values and to be defined for all n € Z = {0,+1,+2,...}. We will then 
write X = (...,X_1, Xo, X1,...) and will suppose that each X, is a complex 
random variable of the form (an + ibn) with E|X,|? = E(an? + bn?) < co for 
alln € Z—see [P §6.1]. 

Our main assumption “stationarity in broad sense” comes down to EX, = EXo 
and COV(Xn+m, Xm) = COV(X,, Xo), for all n,m € Z. 

Without any loss of generality we may and do suppose that EXo = 0, so that 
cov(X,, Xo) = EX,Xo. The function R(n) = EX,Xo, n € Z, is called the 
covariance function of the sequence X. 

e The following two results (the Herglotz theorem and the spectral representation 
theorem) demonstrate that, by nature, a stationary in broad sense random sequence 
is nothing but an infinite sum of harmonics with random amplitudes, the summation 
being taken (with an appropriate limiting procedure) over the entire range of 
frequencies of the harmonics. 

The first result (see [ P §6.1]) states that every covariance function R(n), n € Z, 
admits the spectral representation: 


R(n) = iar F(da), fral nez, 
where F = F(B), B € A&([xz,7)), is some finite real-valued measure, and the 
integral is understood in the sense of Lebesgue-Stiltjes. 


The second result (see [ P §6.3]) gives the spectral representation of the random 
sequence X = (X;,)nez: 


X, = a Z(dà) (P-a.e.), forall ne Z, 


=y. 


A.6 Martingales 387 


where Z = Z(A), A € A([-z,7z)), is some orthogonal (generally, complex- 
valued) random measure with EZ(A) = 0 and E|Z(A)|? = F(A) (recall that in 
our setting EXo = 0). 

If they exist, the spectral function F = F(d) and the spectral density f = 
f(A) (related by F(B) = f, f(A)dd, B € A([-x,7))), play a fundamental 
role in the spectral and correlation analysis of the random sequence X, providing a 
description of the “spectral composition” of the covariance function. 

At the same time, the relation E|Z(A)|? = F(A) reveals the connection between 
the spectral function and the “stochastic spectral component” in the representation 
X,= J ei” Z(dà),n €Z. 

e Given the intrinsic nature of the spectral properties outlined above, it is 
easy to understand why results of this type are so important in the statistics of 
stationary sequences and the statistics of random processes in continuous time. More 
specifically, these features allow one to construct “reasonably good” estimates of the 
covariance function, the spectral density and their characteristics (see [ P §5.4]). All 
of this is instrumental for building probabilistic models of observable phenomena, 
which are consistent with the data derived from experiments. 

Finally, we note that the pioneering work of A. N. Kolmogorov and N. Wiener 
on the theory of filtering, extrapolation and interpolation of random sequences and 
processes, was developed almost entirely in the context of stationary in the broad 
sense random sequences and processes (see [ P §6.6]). 


A.6 Martingales 


In the very early stages of the development of the general theory of martingales 
it was recognized that it would be extremely useful to amend the underlying 
probability space (2, F, P) with a flow of o-algebras, i.e., a filtration, of the form 
(Fn)n>o, where F, C F. The filtration has the meaning of “flow of information,” 
i.e., each F, comprises all “pieces of information” that an observer may be able to 
receive by time n. The structure (2, F, (Fn)n>0, P) is called filtered probability 
space. With any such structure one can associate the notions “adapted” (to the 
filtration (F;,)n>0), “predictable,” “stochastic sequence,” “martingale,” “Markov 
times”, “stopping times,” etc. 

e The sequence of random variables X = (X,)n>o0, defined on the structure 
(2, F,(Fn)n>0,P), is said to be adapted to the filtration (F,)n>0 if Xn is Fn- 
measurable for every n > 0. The same sequence is said to be a martingale on 
(2, F,(Fn)n>o0, P) if, in addition to being adapted to (.F;,),>0, it is integrable, in 
that E|X,,| < co, > 0, and has the property 


E(X, | Fn-1) = Xn-1, forall n>1. 
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If the equality in the above relation is replaced by the inequality E(X, | Fn-1) = 
Xn-1, or the inequality E(X, | Fn—1) < Xn-1, then the sequence X = (Xn)n>0 is 
said to be, respectively, submartingale and supermartingale. 

e The class of martingales includes many special sequences of random vari- 
ables, encountered in many important practical applications (see [P §7.1]). More 
importantly, the general theory of martingales provides methods, insights and 
computational tools that are indispensable for certain aspects of probability theory 
and mathematical statistics—especially in connection with some important practical 
applications. The key insights from the martingale theory are: the invariance of 
the martingale property under random time-change (see [ P §7.2]), the fundamental 
inequalities for martingales and submartingales (see [ P §7.3]) and the convergence 
theorems for martingales and submartingales (see [ P §7.4]). 

Some of the most important practical application of martingale theory, namely: 
the probability for ruin in insurance, the martingale characterization of the absence 
of arbitrage in financial markets, the construction of hedging strategies in complete 
financial markets and the optimal stopping problem, are discussed in [P §7.10] 
through [ P §7.13]. 


A.7 Markov Chains 


In what follows we will expand and reformulate some of the main results from 
the general theory of Markov chains that was developed in [P Chap. 8]. The 
notation and the terminology introduced in [ P Chap. 8] will be assumed, but will be 
modified and expanded, in connection with some new topics that were not included 
in [ P Chap. 8]. 

e Similarly to martingales, a generic Markov chain (in broad sense), X = 
(Xn)n>o, can be treated as a sequence of random variables that are defined on some 
filtered probability space (2, F, (Fn)n>o, P) and take values in some set £, called 
the “state space” of the Markov chain X. The state space E will be endowed with 
the structure of a measurable space and will be denoted by (E, £). As a sequence 
of random variables, the Markov chain X = (X,,),>0 will always be assumed to 
be adapted to the filtration (.F,,),>0, in the sense that X, is , /&-measurable for 
every n > 0. The fundamental property that characterizes X = (X;,),>0 as a Markov 
chain in broad sense can be stated as follows for every n > 0 and every B € & one 
has 

P(Xn41 eB | Fn)(@) = P(Xn41 eB | Xn)(@) (P-a. e.). 


(With a slight abuse of the notation, we will write P(X„+1 € B | X,(w)) instead of 
P(Xn41 EB | Xn)(@).) 

If the filtration (Fa )n>0 happens to be the natural filtration of the sequence 
X = (Xr hot: Les Fn = a = o (Xo, X1,..., Xn) for every n > 0, then the 
Markov property in broad sense becomes Markov property in strict sense, and, if 
this property holds, the sequence X = (X,,),>0 is said to be a Markov chain. 
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In the special case where (E,&) is a Borel space, [P §2.7, Theorem 5] 
guarantees that for every fixed n > O there is a regular conditional probability 
P,,(x; B), x € E, B € &, with the property that for every B € & one can write 


P(X, € B | Xn—-1(@)) = Pyr(Xn-1(@); B), for P-a.e.w E€ 2. 


In the theory of Markov processes, the regular conditional probabilities P,,(x; B), 
n > 0, are called transition functions (from E to &), or Markov kernels. In the 
special case where the transition functions do not depend on n, i.e., one can write 
P,, (x; B) = P(x; B), the associated Markov chain (in broad sense or in strict sense) 
is said to be homogeneous. 

Another important element of the construction of any Markov chain, in addition 
to the transition functions P,,(x; B), n > 0, is the initial distribution x = 1(B), 
B e &, which is simply the probability distribution of the random variable Xọ, i.e., 


zx(B) = P{XoE€ B}, Bee. 


The initial distribution and the transition functions, i.e., the entire collection (x, P4, 
P2, ...), which in the homogeneous case comes down to the pair (x, P), uniquely 
determines the probability distribution (as a random sequence) of the Markov chain 
X = (Xo, X1,...). 

e Following the modern treatment of the subject, [ P Chap. 8] adopts the view that 
the main building blocks in the general theory of Markov chains are the state space 
(E, &) and the collection of transition functions P,(x;B),x €e E, Be &,n>0 
from E to & (which reduces to a single transition function P(x; B), x € E, B € & 
in the homogeneous case). This was a departure from the classical framework, 
in which the starting point is the filtered probability space (2, F, (Fn)n>0,P), 
the state space (E, £), and the sequence X = (Xo, X1,...) of E-valued random 
variables, chosen so that each X, is ¥,,/&-measurable. According to the Ionescu 
Tulcea Theorem (see [P §2.9]), for any given state space (E, £) and any given 
family of transition functions from E to &, one can take (2, #) to be the canonical 
coordinate space (E°, &™) and then construct a family of probability measures, 
{P,.,x € E}, on (Q, F), in such a way that the sequence of coordinate maps, 
X = (Xo, X1,...), given by X,(@) = x, for @ = (xo, x1,...) n = 0, forms 
a Markov chain under the probability measure P,, with P,{Xo9 = x} = 1, for 
every x € E, i.e., under the probability measure P, (on (E, &°)) the sequence 
of coordinate maps X = (Xo, X,,...) (from 2 into E£) behaves as a Markov chain 
that starts from x € E with probability 1. 

Given any probability law m = 2(B), B € £ (think of this law as the “initial” 
distribution of some Markov chain), we denote by P, the probability measure on 
(E~,&~) given by P(A) = Jg P(A) a(dx), A € £”. It is not very difficult 
to check that under the probability measure P, the sequence of coordinate maps X 
behaves as a Markov chain with initial distribution x, i.e., Pr{Xo € B} = x(B), 
for every Be £. 
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e In order to formulate two new variants of the Markov property—the so called 
generalized Markov property and the strong Markov property—we must introduce 
the shift operator 0, its “powers” 0„, and the “random power” 6,, for any given 
Markov time t. The shift operator 0: 2 — Q is defined as 


O(@) = (x1, X2,...), form = (xo, X1,...)- 


In other words, the operator 6 “shifts” the time-scale one period forward (period 
1 becomes period 0, period 2 becomes period 1, and so on), as a result of which 
the trajectory (xo, x1, .. .) turns into (x1, X2,...). (Recall that in [ P Chap. 5], which 
deals with stationary in strict sense random sequences and the related dynamical 
systems, we also had to introduce certain transformations of 2 into itself, which 
were denoted by T.) 

If 6) = I stands for the identity map 6)(@) = œ, the n-th power, 0,, of the 
operator 0, is defined for n > 1 as 0, = 0,-1 00 ( = 80 On—1), i.e., On(@) = 
(S (0(%)). 

Given any Markov time t = t(w) with t < œœ, we denote by 0, the operator 
that acts only on the set 2, = {w : t(@) < oo} in such a way that 6, = 6, on the 
set {t = n}, i.e., if w € Q is such that t (w) = n, then 


0- (w) = 6,(@) : 


If H = H(q) is any F -measurable function of w € 92 (such as, for example, 
tT = t(@), or Xm = Xm(@)), then the function H o 6, is defined as (H o 6,)(@) = 
H(6,(@)),@ E€ 2. 

If o is a Markov time, then the function H o 0, is defined on the 2, = {@ : 
o(w) < oo} so that for every fixed n € {0,1,...} one has H o 6s = H o @, 
everywhere in the subset {0 = n} C Qe, i.e., (H o 0o)(œ) = (H o @,)(@) = 
H(6,(q@)), for every w € {o = n}, and every n = 0,1,.... 

In particular, the above relations imply that, for any m,n = 0,1,... and for any 
Markov time o, one has 


Xm © On = Xm+n, 
Xmo 0o = Xm+o onthe set 2, . 


Furthermore, for every two finite Markov times, t and o, one has 
XxX, o bs = Xrobs+0o . 


The operators 0,: 2 —> & give rise to the inverse operators 071: F > F, 
defined in the obvious way as 


6-'(A) = {w:O,(w) E A}, AEF. 
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If in the last relation the set A is replaced by the set {w : Xm(w) € B}, for some 
B € &, then one can write 


6, (A) = {@ : Xm+n(@) E B} ; 


which is the same as 
0r (Xm (B) = Xpy (B). 


(Additional properties of the operators 6n, 9,, a , etc., can be found in some of the 
problems included in Sect. 8.2 in the present book) 

e With the help of the operators 6, one can establish (see [ P §8.2, Theorem 1]) 
the so called generalized Markov property: if H = H(w) is any bounded (or 
non-negative) and .F -measurable function, then for every choice of the initial 
distribution x and for every integer n > 0 one has 


Ex (H © 0, |. ¥*)(w) = Ex, oH (Pr-a.e.). 


In the above relation E, denotes the averaging over the measure Py and the 
expression Ey, (w) H is understood as w(Xn(@)), where (x) = ExH. 

In fact, the generalized Markov property can be generalized (i.e., weaken) even 
further, in that one can replace the deterministic time n in the above relation with 
some finite Markov time t. To be precise, one can claim the following: if (Hn )n>0 
is any family of bounded (or non-negative) and .F -measurable functions and if t 
is any finite Markov time, then the Markov property implies the so called strong 
Markov property, according to which for any initial distribution z one has 


Ex(Hr 0 6; | FË) = W(t), Xr(w)(@)) (Pr-a.e.), 


where y(n, x) = Ey Hn. 

Note that the expression H, o 6, = (H, o 6,)(@) is understood as (H, © 6,)(@) 
for@ E€ {t =n}. 

e As was pointed out earlier, the distribution (as a random sequence) of any 
homogeneous Markov chain X = (X,„)n>0 with state space (E, &), is completely 
determined by the initial distribution m = z(dx) and the transition function 
P = P(x; B),x € E, B € &. Furthermore, the distributions Px, x € E, which are 
defined on (E™, &~) are determined only by the transition function P = P(x; B). 

It is interesting that the concept of transition functions (or Markov kernels) also 
lies in the core of the (entirely deterministic) domain of mathematical analysis, 
which is known as potential theory. In fact, there is an intrinsic connection between 
potential theory and the theory of homogeneous Markov chains. This connection 
has been extremely beneficial for both fields. 

We will now introduce some important notions in both potential theory and the 
Markovian theory, which will be needed later in this section. 

The transition function P = P(x; B), x € E, B € @, gives rise to the linear 
(one step) transition operator Pg, which acts on functions g = g(x) according to 
the formula 
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Pe(x) = [ ely) Pixsdy). 


(It is quite common to also write (P.g)(x).) The domain of the operator P consists of 
all g € Y°(E, £; R) (= the space of all &-measurable functions on E with values 
in R), for which the integral f g Z(Y) P(x; dy) is well defined for all x € E. Clearly, 
this integral is well defined also on the class of all non-negative and &-measurable 
functions on E, which class we denote by Y°(E,&;R4), or on the class of all 
bounded functions 4°(E, &; R). 

Letting I denote the identity operator Ig(x) = g(x), one can define the n-step 
transition operator P,,, as P, = P(P,—1) for n > 1, or, equivalently, P, = P,- (P) 
forn > 1, with the understanding that Py) = I. 

It is clear that one has 


Pig(x) = Exg(Xn) 


for every g € Y%°(E,&;R), for which the integral Je g0) P"(x; dy) is well 
defined, where P” = P” (x; dy) is the n-step transition function (see [ P §8.1]). 

Given any Markov time t for the filtration (FX )n>0 (FŽ = 0(X0, X1,..., 
Xn)), let P, denote the operator that acts on functions g = g(x) according to the 
formula 


P.g(x) = Bue < oo) g(X;)] : 
Notice that if g(x) = 1, then 


P,1(x) = Py{t < oo}. 


The operators P,,, n > 0, give rise to the (generally, unbounded) operator 


U =, Pas 


n>0 


which is called potential of the operator P (or potential of the associated Markov 
chain). 
For any g € Y°(E, &; R+) one has 


Ug=) Pig = (I + PU)g, 


n>0 


which may be abbreviated as 
U=I1+PU. 


The function Ug is usually called the potential of the function g. 

If the function g(x) is taken to be the indicator of the set B € &, i.e., g(x) = 
I(x), then Ng = }`„>o Zg (Xn) is simply the number of visits of the chain X to 
the set B, and one can write: 
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UIs (x) = > ExIe(Xn) = ExNa. 


n>0 


For any fixed x € E, treated as a function of B € &, the quantity U(x, B) = 
UZg(x) gives a measure on &, which is sometimes called potential measure. 
Choosing B to be a singleton, namely B = {y}, for y € E, turns U(x, {y}) into a 
function of x,y € E, which is usually denoted by G(x, y) and is called the Green 
function (of the operator P, or, of the associated Markov chain). The meaning of 
the Green function should be clear: G(x, y) = Ex Ngy, is nothing but the average 
number of visits to state y € E, starting from state Xo = x € E. 

Analogously to the potential U of the operator P, one can define the kernel Q = 
Q(x; B) of the transition function P = P(x; B) by the formula 


O(x:B) = J P"(x;B) (= In(x) + PO(x: B)). 


n>0 


Since P„ [g(x) = P”(x; B), itis clear that U(x; B) = Q(x; B). 
e The operator P gives rise to another important operator, namely 


L=P-I, 


where, as usual, I denotes the identity operator. In Markovian theory the operator L 
is called the generating operator (a.k.a. the discrete generator) of the homogeneous 
Markov chain with transition function P = P(x; B). The domain, 24, of the 
operator L is the space of all g € ¥°(E, &; R) for which the expression Pg — g is 
well defined. 

If h € Z(E, £; R+) (ie., h takes values in Ry and is &-measurable), then, 
since U = I + PU, its potential H = Uh satisfies the relation 


H=h+PH. 
Consequently, if H € 21, then H solves the (non-homogeneous) Poisson equation 
LV = —h ; Ve DL š 


If one can find a solution, W € #°(E,&;R+), of the equation W = h + PW 
(or to the equation LW = —h, when W € 2), then, since W = h + PW >h, 
one can show by induction that W > Xi Ph for any n > 1, so that W > H. As 
a result, the potential H = UA is the smallest solution to the system V = h + PV 
within the class 2? (E, &; R+) (remind that Uh(x) = Ex Xo A(Xx)). 

e A function f = f(x), x € E, that belongs to the class °(E,&;R+), is 
said to be excessive for the operator P (or, for the associated Markov chain with 
transition function P = P(x; B)), if 


Pr 2T, 
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or, which amounts to the same, E, f(X1) < f (x), for all x € E. In particular, the 
potential H = Uh of any function h € Z(E, £; R+) is an excessive function. 
The function f € #°(E, &; R+) is said to be harmonic (or invariant), if 


Pf=f, 


i.e., Ex f(X1) = f(x), forall x € E. 

The connection between potential theory (to which the notion of excessivity 
belongs) and probability theory (specifically, the martingale theory) becomes 
evident from the following statement: if X = (Xn )n>0 is any homogeneous Markov 


chain with initial distribution m and with transition function P = P(x; B), if 
the associated distribution in the space (E™,&°) is Px, and if f = f(x) is 
any P-excessive function, then one can claim that Y = (Y,, FX , Pa)n20, with 


Y, = f (Xn), is a non-negative supermartingale sequence, in that: 


Yn is FX -measurable, forall n > 0; 


Ez Yny | FX) < Y, (Pz-a.e.), forall n>0. 


If, in addition, one can claim that EY, < ov, for all n > 0, then Y = 
(Yn, FX , Px)n>o is simply a supermartingale. 

It is interesting to point out that some of the main properties of 
non-negative supermartingales (see [P §7.4, Theorem 1]) continue to hold also 
for non-negative supermartingale sequences of the type described above: the limit 
lim, Yn ( = Yoo) exists with P,,-probability 1; furthermore, if P;{Yo < co} = 1, 
then Pz{Yoo < co} = 1. The proof of this claim is delegated to Problem 7.4.24. 

e Given any h € #°(E,&;R4), or h € HE, £; R+), the potential, H(x) = 
Uh(x), satisfies the relation H(x) = h(x) + PA(x), which, in turn, gives 


A(x) > max(h(x),PH(x)), xEE. 


Consequently, the potential H(x) = Uh(x) does both: dominates the function 
h(x) i.e., H(x) > h(x), x € E) and belongs to the class of excessive functions 
(one usually says that the potential of given function is an example of an excessive 
majorant of that function). 

In fact, many practical problems—the optimal stopping problem from [P §8.9] 
being a typical example—can be formulated as problems for computing the smallest 
excessive majorant of a given &-measurable non-negative function g = g(x). 
Potential theory provides a special technique for solving such problems, which is 
described next. 

Let Q denote the operator that acts on all &-measurable non-negative functions 
g = g(x) according to the formula 


Qge(x) = max (g(x), Pg(x)) . 
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Next, notice that the smallest excessive majorant, s(x), of any such function g(x) is 
given by the formula 
s(x) = lim Q" g(x) 


and satisfies the equation 
s(x) = max (g(x), Ps(x)), x EE. 
In particular, the last equation implies that for every s € A_ one must have 


Ls(x) = 0, x € Cz; 
s(x) = g(x), x € Dg, 


where C, = {x : s(x) > g(x)} and Dg = E \ Cg. (The proof of this claim can be 
found in [P §8.9], where the token P is replaced by T, and the token Q is replaced 
by Q.) 

e One of the central issues in potential theory is the description of the class of 
solutions to the Dirichlet problem for the operator P: for a given domain C C E 
and two &-measurable non-negative functions h and g, defined, respectively, on C 
and D = E \ C, one must find a non-negative function V = V(x), x € E, chosen 
from one of the classes 2° (E, £; R4), ZO (E, £; R+), Z(E, £; R), etc., which 
satisfies the equation 


Z PV(x) +h(x), x EC; 


V(x) 
g(x), xeéeD. 
If one looks for solutions V only in the class 24 , then the above system is equivalent 
to the following one: 


LV(x) = —hA(x), xeEeC; 
V(x) = g(x), xeD. 


The first equation above is commonly referred to as “the Poisson equation for the 
domain C” and, usually, the Dirichlet problem is understood as the problem for 
solving the Poisson equation in some domain C, with the requirement that the 
solution is defined everywhere in E and coincides on the complement D = E \ C 
with a given function g. 

It is quite remarkable that the solution to the Dirichlet problem—which is entirely 
non-probabilistic—can be expressed in terms of the homogeneous Markov chain 
with transition function P = P(x; B), which gives rise to the operator P. To make 
this claim precise, let X = (X;,),>0 be one such Markov chain and let t(D) = 
inf{n > 0: X, € D} (with the usual convention inf{@} = oo). One can then claim 
that for every two functions, h and g, from the class # (E ,ĝ; R+), a solution to 
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the Dirichlet problem exists and the smallest non-negative solution, Vp (x), is given 
by the formula: 


t(D)—1 


Vp (x) = Ex[ (t(D) < 00)8(Xx(p))] + 1e(0E| > ra| 
k=0 


(For the proof of the last statement see the hint to Problem 8.8.11.) 

Some special choices for h and g are considered next. 

(a) If h = 0, i.e., one is looking for a function V = V(x) which is harmonic in 
the domain C and coincides with the function g on D = E \ C, then the smallest 
non-negative solution Vp (x) is given by the formula 


Vp (x) = E| I(t (D) < 00) g(Xp))] - 
In particular, if g(x) = 1, x € D, then 
Vp(x) = Py {t(D) < œ}. 


At the same time, the probability, P,{t(D) < oo}, that the Markov chain will 
eventually reach D, starting from Xo = x, treated as function of x € E, can be 
claimed to be harmonic in the domain C. It is clear that if x € D, then P,{t(D) < 
oo} = I, since in this case t(D) = 0. 

(b) With g(x) = 0, x € D, and h(x) = 1, x € C, the system becomes 


PV(x)+1, xeC; 


V(x) = 
( 0, xED. 


(*) 


In this case the smallest non-negative solution is given by the formula 


t(D)—1 
E,t(D), xeC; 
Vp(x) = Ic (x) Ex L= 
p(x) ce] D iF a 


In particular, treated as a function of x € E, the expected time, E,t(D), until the 
first visit to D gives the smallest non-negative solution to the system (*). 

e A particularly important class of Markov sequences, associated with random 
walks on some state space (E, £), is the class of simple symmetric random walks 
on the lattice 


E =Z’ ={0+1,+2,...3%, 


where d is a finite integer chosen from the set {1,2,...} (see [P §8.8]). Random 
walks in the “entire” space E = Z’, of the form X = (Xn)n>0, can be defined 
constructively, by setting 


X,=xt+E&+...+&, 
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where £, &,... is a sequence of independent R¢-valued random variables, which 
are defined on some probability space (2, #,P), and are distributed uniformly in 
the set of all basis vectors e = (e1,..., eq) € R7, defined by e; = 0,+1 or — 1 and 
llel = ler] +... + Jea| = 1; in particular 


P{& =e} = 2d). 


Such a random walk describes the movement of a “particle” which, starting from 
some point x € Z“, during every period moves arbitrarily to one of the 2d 
neighboring points on the lattice, and in such a way that each neighboring point 
is equally likely to get selected. 

The operator P, associated with such a random walk, has a particularly simple 
form: 


Pf) =E,f@+h)= a deer Se +2)- 


Consequently, the generating operator (or, the discrete generator) L = P — I, which 
in this case is referred to as the discrete Laplacian and is commonly denoted by A, 
has the following form 


Af) = 55 Yh (Fe +2) O) 


It is natural to reformulate the Dirichlet problem for the simple random walk by 
taking into account the fact that exit from C C Z“ can happen only on the 
“boundary” 


dC = {x : x € Z’, x ¢C and ||x — y|| = 1 for some y € C}. 


This observation leads to the following standard formulation of the (non- 
homogeneous) Dirichlet problem on the lattice: given some domain C C Z? 
and functions h = A(x), x € C, and g = g(x), x € OC, find a function V = V(x), 
x € C UOC, which satisfies the equations 


AV(x) = —A(x), xec; 
V(x) = g(x), x €dC. 
If the domain C consists of finitely many points, then P,{t(dC) < co} = 1 for all 
x € C, where t(0C) = inf{n > 0: X, € OC} (see Problem 8.8.12). By using the 


method described earlier, one can show that the solution in the domain C U ðC is 
unique and is given by the formula: 


t(0C)—1 
Vac (x) = Ex [g(Xz@cy)] + =| w| , xeEeCU3C. 
k=0 
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Since in this case the domain C is finite, there is actually no need to suppose that 
the functions A(x) and g(x) are non-negative. In particular, setting h = 0, one finds 
that the only function on C U ðC, which is harmonic in C and equals g on ðC, is 
the function 

Vac (x) = Exg(Xzac)). 


We now turn to the homogeneous Dirichlet problem: 


AV(x) = 0, xEeC, 


(+) 
V(x) = g(x), x€0Cc. 
treated on some unbounded domain C. 

Ifd < 2, by Pélya’s theorem (see [ P §8.8]) one must have P,. {t(dC) < co} = 1, 
which, by using the same reasoning as in the case of finite domains, leads to the 
following result: if the function g = g(x) is bounded, then, in the class of bounded 
functions on C U dC, the solution to (**) is unique and is given by 


Vac (x) = Ex. g(X;ac)) : 


One must realize that even with bounded g = g(x) there could be multiple 
solutions in the class of unbounded functions on C U ðC. A classical example of 
such situation is the following. In dimension d = 1 consider the domain C = Z \ 
{0}, for which OC = {0}. Setting g(0) = 0, it is easy to see that every (automatically 
unbounded) function V(x) = a x, a € R, is a solution to the Dirichlet problem, i.e., 
one has AV(x) = 0, x € Z \ {0}, and V(0) = g(0). 

In dimension d > 3 the question of existence and uniqueness of the solution 
to the Dirichlet problem AV(x) = 0, x € C, and V(x) = g(x), x € OC, even 
in the class of bounded functions V(x), x € C U ðC, depends on the condition 
P.{t(0C) < co} = 1, for all x €e C. If this condition holds and g = g(x) 
is bounded, then one can claim that there is precisely one solution in the class of 
bounded functions on C U dC, which is given by Vac (x) = Exg(Xz(ac)), for all 
xECUdc. 

However, if the condition P,{t(dC) < co} = 1, x € C, does not hold, then, 
assuming that g = g(x), x € OC, is bounded, every (automatically bounded) 
function of the form 


V2 (x) = E, [I (t(8C) < 00)g(X,(ac))] + aPx{t(C) = oo}, 


for all choices of a € R, is a solution to the Dirichlet problem AV(x) = 0, x € C, 
and V(x) = g(x), x € 0C—see, for example, [75, Theorem 1.4.9]. 

e The discussion in [P Chap. 8] of the various aspects of the classification of 
Markov chains with countable state space follows the tradition established during 
the 1930s in the works of Kolmogorov, Fréhet, Döblin and others, which is based on 
the idea that any classification must reflect, on the one hand, the algebraic properties 
of the one-step transition probability matrix, and, on the other hand, the asymptotic 
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properties of the transition probabilities as the time grows to oo. Since then notions 
like 


essential and inessential states, 
reachable and communicating states, 


irreducibility and periodicity, 


which are determined from the properties of the one-step transition matrix, and 
notions like 


transience and recurrence, 
positive recurrent and null recurrent states, 
invariant (stationary) distributions, 


ergodic distributions and ergodic theorems, 


which are determined from the limiting behavior of the transition probabilities, have 
become central in the theory of Markov chains. 

Gradually, it became clear that it is more convenient to study the asymptotic 
properties of Markov chains by utilizing the tools of potential theory, the basic 
ingredients of which (e.g., the notion of potential, the notions of harmonic and 
excessive functions, and some basic results involving those notions) were introduced 
above. 

The exposition in [ P Chap. 8] makes it clear that the primary tool for studying 
the limiting behavior of Markov chains is a method that would be rather natural to 
call “the method of regenerating cycles,” as is explained next. 

Let x € E be any state and let (ok )x>o be the sequence of “regenerating Markov 
times,” which is constructed as follows: first, define o? = 0 and ol = 0x, where 


ox = inffn > 0: X, = x}; 


then define by induction, for any k > 2, 


a = inf{n > go : Xn =x} on the set ae < oo}. 


Equivalently, one can write 


p Yok! + osob, if ok! < oo; 
=)" x i 

x A 7 
00, if oF! = oo. 
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The following properties explain the term “regenerating times” and its connection 
with the “regenerating cycles”: 
1. On the set {o* < 00} one has X,x = x. 


2. On the set {ok < oo} the sequence (X,4,,)n>0 is independent from random 
vector (Xo, X1,..., Xok—1), relative to the measure Px. 


3. If ok (w) < co forall œ € E, then, relative to Py, the distribution of the 
sequence (Xok 4n)n>0 is the same as the distribution of the sequence (X, )n>0. 


4. If of (w) < œ for all w € E”, then, relative to Py, the “regenerating cycles” 
(Xo, Xi, sees Xoi), EEE (Xori, Xok-i41> ERE, Xok—1) 


are independent. 
5: P,.{o* <œ} = P {of! < co}P,{o, < oo} and, therefore, P, {a < co} = 
[Px {Ox < oo}]”. 
6. Setting Ny = J` Ita (Xn) (in the notation introduced previously, this is 
n>0 


nothing but N;,}, the number of visits to state x), then the expected time EN, 
(which is Ex Nix} = G(x, x)) is given by 


EN, = 14+ >) Pio? < o0} = 1+) Plex < o0}]". 
n>1 n>1 
7. The above relations entail 


P,.{o, <oof = 1 s EyNy =cC0 & P,{N, = co} = 1, 
Plor < cof <1 S ENM < co & PiN < cof = 1. 


8. For any y # x one has 
G(x, y) = Px{ay < cofG(y, y), 


or, equivalently, 
E,N, = P,x{o, < oof Ey Ny. 


9. If P,.{a* < oo} = 1 forall k > 1, then the sequence of “regenerating periods”, 
(ok — o% ~!).s0, is a sequence of independent and identically distributed random 
variables. 

Recall that according to the definitions in [ P §8.5] the state x € E is called 


recurrent, if Px{0x < oo} = 1, 


transient, if Px{0x < oo} < 1. 


A.7 Markov Chains 401 


Since (see [P §8.5, Theorem 1]) 


Py{o, <oof} = 1 © P,{X, = xi.o.} = 1, 
Py {ox < oof <1 & Py{X, = xio}=0, 


the state x € E is (or may be called that by definition) 


recurrent, if P,{X, = x i.o.} = 1, 


transient, if P,{X, = x i.o.} = 0. 


In fact, the intrinsic meaning of the terms “recurrent” and “transient” is better 
reflected in the relations “P,{X, = xi.o.} = 1” and “P,{X, = xi.o.} = 0,” 
as opposed to the equivalent relations “Px{0x < oo} = 1” and “P,{a, < co} < 1”. 
Indeed, “recurrence of x” is to be understood as “eventual return to x after every 
visit to x” and “transience of x” is to be understood as “non-recurrence of x,” i.e., 
as “non-return after some visit to x”. 

Thus, the recurrence of the state x is equivalent to each of the following 
properties 


P.{X, = xio.} = 1, or Py{Ny = co} = 1, or Ex Ny = co, 
while the transience is equivalent to each of the properties: 
P,{X, = xio.} = 0, or P. {Ny < œ} = 1, or Ey Ny < œ. 


e The use of potential theory and, in particular, the technique of “regenerating 
cycles” allows one to develop a more or less complete understanding of the structure 
of the invariant measures and distributions (i.e., probability distributions). The 
exposition below follows [85]. 

Recall that any (one-step) transition probability matrix P = ||p,,||, x,y € E, 
gives rise to the linear operator P f, which acts on functions f € Z} (E, £; R+) 
according to the rule 


(PA) =o py flO), xEk, 


yeE 
understood as 
(matrix P) ® (vector-column f) = (vector-column Pf). 
Let q = q(A), A C E, be any non-trivial (i.e., not identically 0 or oo) measure 
defined on the subsets of some countable set Æ. Such a measure is completely 


determined by its values, g({x}), on the singleton sets {x}, x € E (for the sake 
of simplicity we will write g(x) instead of q({x})). 
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Let .@, denote the space of such measures q and let P stand for the linear 
operator that transforms measures from %4 into measures from %44} according 
to the rule _ 4, > q ~> qP E€ M4, where qP is the measure 


qP(y) =) oq)pxy. YEE, 


xEE 


i.e., qP € .@, is understood as the vector 
(vector-column q P) = (vector-column q) ® (matrix P). 


The measure q € -@ is said to be invariant or stationary for the Markov chain 
with operator P if qP = q. The measure q E€ %4 is said to be excessive, or P- 
excessive, if gP < q. 

Next, consider the bi-linear form 


a P= aS, f eL E, E R4), qE M4. 


x 


The following duality relation is easy to verify: 


(4, Pf) = (4P, f), f EZ (E, E; R4), q E€ M4. 


Essentially, the above relation says that the action of the operator P on functions 
and the action of the operator P on measures can be interchanged. 

[P §8.6, Theorem 2] shows that, in the case of irreducible (there is only one 
class) and positive recurrent Markov chains with countable state space, an invariant 
distribution exists, it is unique, and is given by 


q(x) = [Exo] , XEE, 


where oy = inf{n > 1: X, = x} is the time of the first recurrence to x. (Note that 
1 < Exo, < œ, x € E.) 

As we are about to show, by using the characteristics of the first “regenerating 
cycle” the result about the existence and the structure of the invariant sets can 
be established for arbitrary irreducible and recurrent Markov chains, without the 
requirement for positive recurrence. 

More specifically, one can claim the following: 


Any irreducible and recurrent Markov chain X = (X,,),>0, which has a countable state 
space E, admits an invariant measure q = q(A), A © E, which is non-trivial, in that 
0 < q(E), q(x) Æ œ and 0 < q(x) < œo, for any state x € E. This measure is unique up 
to a multiplicative constant. 
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To prove the above statement, notice that for any fixed state x° € E, one can 
always construct an invariant measure, say q°, with the property g°(x°) = 1. For 
example, one can set 


goml 


q° (x) = Ep > Ito (Xx) , x ez, 
k=0 


where 0,0 = inf{n > 1: X, = x°}. In order to show that the above measure is 
indeed invariant (and that therefore invariant measures exist), it would be enough to 
show that for any function f € -2° (E, &; R+) one has 


(°P, f) = (4°, f). 


In conjunction with the strong Markov property established in Problem 8.2.13, the 
last relation follows from the following chain of identities: 


o,o-1 


o,o-1 
Pf) = Pf) = Ee] D PAE] = Ev] Z Eara] 
k=0 k=0 


= Exe [Ico 0}Ex, f(X] = J Exe Heco} Exo lf 0 O | Fil} 


k>0 k>0 
= > Exo {Exo [tk<o,o} f o Ok | F,)} = 2 Exo [ik<o,o} f 2 Or. | 
k>0 k>0 
0,0 o,o-1 
= Epo >) Ico} f (Xr+) = Exo > fX) = Er > IOV) 
k>0 1=1 k=0 
= (q°, f). 


In addition to the normalization g°(x°) = 1, the measure q° constructed above also 
has the property 0 < q° (x) < oo, forall x € E. This last property follows from the 
following simple fact about excessive measures. 

Suppose that the underlying Markov chain is irreducible and that the measure 
q E€ Mà is excessive, i.e., qP < q. If there is a state x° € E for which g(x°) = 0 
(note that g(x°) < oo), then for any x € E one must have q(x) = 0 (note that 


q(x) < œœ). To see why this claim can be made, observe that for any x # x° one 
(n) 


Xe 


can find an integer n > 1, for which p > 0. As a result, the relations 


C= 90°) > ) op, 2 qp, 
yeE 


imply that g(x) = 0. 
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We will now show that, up to a positive multiplicative constant, g° is the only 
non-trivial invariant measure. For that purpose, suppose that g is some invariant 
(and, therefore, also excessive) measure with 0 < g(x) < œo, forall x € E. Set 


q(x) 


fœ) = °C) 


, XEE, 


and define the (dual of pyy) function Pxy = a Pyx. Since for every fixed x € E 


one has j a(x a 
Pry == q° (x) Pyx = 
2 ” ew 2 = Ge) 
the matrix P = ||P xy || can be treated as a transition probability matrix and one can 
write 
S A ~ 4) q 0), q(y) 
Pf) =) Py f0) = Po = DS : 
oy “ee a1 x) g0) 
= zg Ero EE = SO. 
X yEE 
The function f = f(x) is therefore P-harmonic. Since Ps: = aa Pyx by 


definition, for every n > 1 one must have 


a(n) _ q°(y) (n) 
qg) 


xy 


which entails the following relation between the respective Green functions 


Ne 
q°(x) 


G(x, y) = G(y, x). 


The last two relations imply that if the Markov chain X = (X,,),>0, with operator 
P, happens to be irreducible and recurrent, then the dual chain X = (X,)n>0, 
with operator P, must be irreducible and recurrent, too. However, if f = f(x) is 
any (automatically non-negative) excessive function (in particular, if f = f(x) is 
harmonic), then the sequence (f (Xn))n>0 must be a non-negative supermartingale 
relative to the measure P., for any initial distribution x. This property was 
mentioned earlier. We also noted that for sequences of that form the limit lim,, Xn 
( = Xo) exists P -almost everywhere, and, therefore the limit lim, f(X,) (= Z) 
must exist Pa -almost everywhere, too. It is easy to see that if the chain is irreducible 
and recurrent, then for any two states, x and y # x, one can claim that X„ visits 
infinitely many times both x and y. In particular, this shows that f(x) = f(y), for 
every x, y € E, so that f(x) = const. 
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We have thus established that any other invariant distribution q, such that 0 < 
q(x) < oo, x € E, must be a multiple, with some positive constant factor, of the 
measure q°. 

The result that we just established entails the following feature of all irreducible 
and recurrent Markov chains, which was mentioned earlier: the only invariant 
probability distribution g° = (q° (x), x € E) is given by q° (x) = [E,o,]7!, x € E. 

e We now turn to certain ergodic theorems for Markov chains with countable state 
spaces, i.e., theorems about convergence almost surely as n — oo of quantities of 


the form 1 ae F (Xx), or, more generally, of the form 


n=] n 


DECOT DDOL 
k=0 k=0 


for certain classes of functions f and g. We will again rely on the technique of 
“regeneration cycles.” 

Let X = (X,)n>0 be some irreducible and recurrent Markov chain with 
countable state space E and invariant measure g°(x), such that 0 < g(x) < œ 
for all x € E and q° (x°) = 1 for some fixed state x°. 

Next, suppose that f = f(x) and g = g(x) are two function from the class 
L'(q°), ie., f = f(x) and g = g(x) are two function on E chosen so that 


Ler [F@)a°(a) < 00 and Deg |g(x)19°(x) < 00, and set 


olo—1 o"t! 
Yo= D> J) and a= J JO) neta): 
KEN k=0% 


By the very definition of the invariant distribution q° we have 


Ex Yo = (4°, f), 


and, due to the Markov property, for any initial distribution x one must have 
Esl = Ex [Ex,m (Yo) | = Exo Yo = (q°, f) , 


Thus, relative to the measure P,,, the random variables Y;, Y2,... are independent 
and identically distributed, and, furthermore, have the property E, Ym = (q°, f) 
(<oo), m > 1. The strong law of large numbers now implies that for every initial 
distribution z one must have (P,,-a. e.) 


SS yoo 3 ae eae Sa eam, 
n n nN 


n 
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and, assuming that (q¢°, g) # 0, one must have (again, for every 7) 


I (Xx) 


oa", 

k=0 EN (P ) 

a => as n > œ 7a.€.). 
x 

k=0 


g(Xx) 


n 
Next, let v”, = X I(Xk = x°), and notice that, since the chain is recurrent, one 
k=1 


, . v” vot 
can claim that v, — oo as n — oo (P,,-a. e.). Since cs <n< Gs , the above 
convergence entails the ergodic theorem for ratios: 


D F(X) a. f) 
‘0 > -— as n—> oœ (P,-a.e.). 
k=0 


Finally, suppose that the Markov chain under consideration is irreducible and 
positive recurrent. In this case one can replace the measure q° with the probability 
distribution 7° = (m°(x),x € E), chosen so that 7° = 1/(E,o,), and, as a 


result, arrive at the following ergodic theorem (for irreducible and positive recurrent 
Markov chains): 


LE SaD > (2°, f) asn—oo (P,-a.e.), 
k=0 


for any initial distribution z (in particular, for m = 7°). 
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